From christian.thalinger at oracle.com  Fri Apr  1 00:22:08 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 31 Mar 2016 14:22:08 -1000
Subject: RFR: 8144964: JVMCI compilations need to be disabled until the
	module system is initialized
In-Reply-To: <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com>
References: <C5EE62B2-9171-4276-B30C-2A073B2D1542@oracle.com>
	<56FCB10C.3020609@oracle.com>
	<1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com>
	<792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com>
Message-ID: <A38235C1-32FE-442F-A353-2C962E786DF4@oracle.com>

Vladimir pointed out a bug.  Of course it should be:

+  bool must_load;
+#if INCLUDE_JVMCI
+  if (EnableJVMCI) {
+    // If JVMCI is enabled we require its classes to be found.
+    must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci);
+  } else
+#endif
+  {
+    must_load = (init_opt < SystemDictionary::Opt);
+  }

> On Mar 31, 2016, at 1:08 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
> I found a problem when graal.jar is appended to the boot class path.  Somehow (and I don?t know why, yet) in that case jdk.vm.ci classes are not found when trying to preload them and the VM crashes.  We need to make sure the jdk.vm.ci classes are preloaded if the JVMCI is enabled.
> 
> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp
> --- a/src/share/vm/classfile/systemDictionary.cpp	Thu Mar 31 09:16:49 2016 -0700
> +++ b/src/share/vm/classfile/systemDictionary.cpp	Thu Mar 31 13:04:35 2016 -1000
> @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla
>    int  sid  = (info >> CEIL_LG_OPTION_LIMIT);
>    Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid);
>    InstanceKlass** klassp = &_well_known_klasses[id];
> -  bool must_load = (init_opt < SystemDictionary::Opt);
> +
> +  bool must_load;
> +#if INCLUDE_JVMCI
> +  if (EnableJVMCI) {
> +    // If JVMCI is enabled we require its classes to be found.
> +    must_load = (init_opt <= SystemDictionary::Jvmci);
> +  } else
> +#endif
> +  {
> +    must_load = (init_opt < SystemDictionary::Opt);
> +  }
> +
>    if ((*klassp) == NULL) {
>      Klass* k;
>      if (must_load) {
> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp
> --- a/src/share/vm/classfile/systemDictionary.hpp	Thu Mar 31 09:16:49 2016 -0700
> +++ b/src/share/vm/classfile/systemDictionary.hpp	Thu Mar 31 13:04:35 2016 -1000
> @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic {
>  
>      Opt,                        // preload tried; NULL if not present
>  #if INCLUDE_JVMCI
> -    Jvmci,                      // preload tried; error if not present, use only with JVMCI
> +    Jvmci,                      // preload tried; error if not present if JVMCI enabled
>  #endif
>      OPTION_LIMIT,
>      CEIL_LG_OPTION_LIMIT = 2    // OPTION_LIMIT <= (1<<CEIL_LG_OPTION_LIMIT)
> 
> 
>> On Mar 31, 2016, at 11:10 AM, Christian Thalinger <christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>> wrote:
>> 
>> Thanks, Vladimir.
>> 
>>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>> 
>>> Looks fine.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 3/30/16 5:01 PM, Christian Thalinger wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8144964 <https://bugs.openjdk.java.net/browse/JDK-8144964>
>>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/
>>>> 
>>>> JVMCI compilations need to be disabled until the module system is initialized.  Basically, only allow tier 1-3 compilations until it's up.
>>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160331/0af04b65/attachment-0001.html>

From vladimir.kozlov at oracle.com  Fri Apr  1 00:36:01 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 31 Mar 2016 17:36:01 -0700
Subject: RFR: 8144964: JVMCI compilations need to be disabled until the
	module system is initialized
In-Reply-To: <A38235C1-32FE-442F-A353-2C962E786DF4@oracle.com>
References: <C5EE62B2-9171-4276-B30C-2A073B2D1542@oracle.com>
	<56FCB10C.3020609@oracle.com>
	<1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com>
	<792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com>
	<A38235C1-32FE-442F-A353-2C962E786DF4@oracle.com>
Message-ID: <56FDC271.7090201@oracle.com>

Looks good.

Thanks,
Vladimir

On 3/31/16 5:22 PM, Christian Thalinger wrote:
> Vladimir pointed out a bug.  Of course it should be:
>
> +  bool must_load;
> +#if INCLUDE_JVMCI
> +  if (EnableJVMCI) {
> +    // If JVMCI is enabled we require its classes to be found.
> +    must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci);
> +  } else
> +#endif
> +  {
> +    must_load = (init_opt < SystemDictionary::Opt);
> +  }
>
>> On Mar 31, 2016, at 1:08 PM, Christian Thalinger <christian.thalinger at oracle.com
>> <mailto:christian.thalinger at oracle.com>> wrote:
>>
>> I found a problem when graal.jar is appended to the boot class path.  Somehow (and I don?t know why, yet) in that case
>> jdk.vm.ci classes are not found when trying to preload them and the VM crashes.  We need to make sure the jdk.vm.ci
>> classes are preloaded if the JVMCI is enabled.
>>
>> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp
>> --- a/src/share/vm/classfile/systemDictionary.cppThu Mar 31 09:16:49 2016 -0700
>> +++ b/src/share/vm/classfile/systemDictionary.cppThu Mar 31 13:04:35 2016 -1000
>> @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla
>>    int  sid  = (info >> CEIL_LG_OPTION_LIMIT);
>>    Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid);
>>    InstanceKlass** klassp = &_well_known_klasses[id];
>> -  bool must_load = (init_opt < SystemDictionary::Opt);
>> +
>> +  bool must_load;
>> +#if INCLUDE_JVMCI
>> +  if (EnableJVMCI) {
>> +    // If JVMCI is enabled we require its classes to be found.
>> +    must_load = (init_opt <= SystemDictionary::Jvmci);
>> +  } else
>> +#endif
>> +  {
>> +    must_load = (init_opt < SystemDictionary::Opt);
>> +  }
>> +
>>    if ((*klassp) == NULL) {
>>      Klass* k;
>>      if (must_load) {
>> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp
>> --- a/src/share/vm/classfile/systemDictionary.hppThu Mar 31 09:16:49 2016 -0700
>> +++ b/src/share/vm/classfile/systemDictionary.hppThu Mar 31 13:04:35 2016 -1000
>> @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic {
>>
>>      Opt,                        // preload tried; NULL if not present
>>  #if INCLUDE_JVMCI
>> -    Jvmci,                      // preload tried; error if not present, use only with JVMCI
>> +    Jvmci,                      // preload tried; error if not present if JVMCI enabled
>>  #endif
>>      OPTION_LIMIT,
>>      CEIL_LG_OPTION_LIMIT = 2    // OPTION_LIMIT <= (1<<CEIL_LG_OPTION_LIMIT)
>>
>>
>>> On Mar 31, 2016, at 11:10 AM, Christian Thalinger <christian.thalinger at oracle.com
>>> <mailto:christian.thalinger at oracle.com>> wrote:
>>>
>>> Thanks, Vladimir.
>>>
>>>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>>
>>>> Looks fine.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/30/16 5:01 PM, Christian Thalinger wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8144964
>>>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/
>>>>>
>>>>> JVMCI compilations need to be disabled until the module system is initialized.  Basically, only allow tier 1-3
>>>>> compilations until it's up.
>>>>>
>>>
>>
>

From christian.thalinger at oracle.com  Fri Apr  1 00:36:59 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 31 Mar 2016 14:36:59 -1000
Subject: RFR: 8144964: JVMCI compilations need to be disabled until the
	module system is initialized
In-Reply-To: <56FDC271.7090201@oracle.com>
References: <C5EE62B2-9171-4276-B30C-2A073B2D1542@oracle.com>
	<56FCB10C.3020609@oracle.com>
	<1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com>
	<792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com>
	<A38235C1-32FE-442F-A353-2C962E786DF4@oracle.com>
	<56FDC271.7090201@oracle.com>
Message-ID: <8E272A3F-4D29-4E31-BAC9-B602745F87A4@oracle.com>

Thank you, Vladimir.

> On Mar 31, 2016, at 2:36 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 3/31/16 5:22 PM, Christian Thalinger wrote:
>> Vladimir pointed out a bug.  Of course it should be:
>> 
>> +  bool must_load;
>> +#if INCLUDE_JVMCI
>> +  if (EnableJVMCI) {
>> +    // If JVMCI is enabled we require its classes to be found.
>> +    must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci);
>> +  } else
>> +#endif
>> +  {
>> +    must_load = (init_opt < SystemDictionary::Opt);
>> +  }
>> 
>>> On Mar 31, 2016, at 1:08 PM, Christian Thalinger <christian.thalinger at oracle.com
>>> <mailto:christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>>> wrote:
>>> 
>>> I found a problem when graal.jar is appended to the boot class path.  Somehow (and I don?t know why, yet) in that case
>>> jdk.vm.ci classes are not found when trying to preload them and the VM crashes. We need to make sure the jdk.vm.ci
>>> classes are preloaded if the JVMCI is enabled.
>>> 
>>> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp
>>> --- a/src/share/vm/classfile/systemDictionary.cppThu Mar 31 09:16:49 2016 -0700
>>> +++ b/src/share/vm/classfile/systemDictionary.cppThu Mar 31 13:04:35 2016 -1000
>>> @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla
>>>   int  sid  = (info >> CEIL_LG_OPTION_LIMIT);
>>>   Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid);
>>>   InstanceKlass** klassp = &_well_known_klasses[id];
>>> -  bool must_load = (init_opt < SystemDictionary::Opt);
>>> +
>>> +  bool must_load;
>>> +#if INCLUDE_JVMCI
>>> +  if (EnableJVMCI) {
>>> +    // If JVMCI is enabled we require its classes to be found.
>>> +    must_load = (init_opt <= SystemDictionary::Jvmci);
>>> +  } else
>>> +#endif
>>> +  {
>>> +    must_load = (init_opt < SystemDictionary::Opt);
>>> +  }
>>> +
>>>   if ((*klassp) == NULL) {
>>>     Klass* k;
>>>     if (must_load) {
>>> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp
>>> --- a/src/share/vm/classfile/systemDictionary.hppThu Mar 31 09:16:49 2016 -0700
>>> +++ b/src/share/vm/classfile/systemDictionary.hppThu Mar 31 13:04:35 2016 -1000
>>> @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic {
>>> 
>>>     Opt,                        // preload tried; NULL if not present
>>> #if INCLUDE_JVMCI
>>> -    Jvmci,                      // preload tried; error if not present, use only with JVMCI
>>> +    Jvmci,                      // preload tried; error if not present if JVMCI enabled
>>> #endif
>>>     OPTION_LIMIT,
>>>     CEIL_LG_OPTION_LIMIT = 2    // OPTION_LIMIT <= (1<<CEIL_LG_OPTION_LIMIT)
>>> 
>>> 
>>>> On Mar 31, 2016, at 11:10 AM, Christian Thalinger <christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>
>>>> <mailto:christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>>> wrote:
>>>> 
>>>> Thanks, Vladimir.
>>>> 
>>>>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com><mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>> wrote:
>>>>> 
>>>>> Looks fine.
>>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>> On 3/30/16 5:01 PM, Christian Thalinger wrote:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8144964 <https://bugs.openjdk.java.net/browse/JDK-8144964>
>>>>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/ <http://cr.openjdk.java.net/~twisti/8144964/webrev.01/>
>>>>>> 
>>>>>> JVMCI compilations need to be disabled until the module system is initialized.  Basically, only allow tier 1-3
>>>>>> compilations until it's up.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160331/91ff4786/attachment.html>

From michael.c.berg at intel.com  Fri Apr  1 03:36:45 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Fri, 1 Apr 2016 03:36:45 +0000
Subject: CR for RFR 8151573
In-Reply-To: <56FCADE3.20403@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E4C64D@FMSMSX102.amr.corp.intel.com>
	<56E881A9.7070004@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C6A9@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4C6DB@FMSMSX102.amr.corp.intel.com>
	<56E89CA4.8010201@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C719@FMSMSX102.amr.corp.intel.com>
	<56E97EC5.6030608@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C8E8@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4F529@FMSMSX102.amr.corp.intel.com>
	<56FC5852.2030101@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4F60F@FMSMSX102.amr.corp.intel.com>
	<56FCADE3.20403@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E4FBFA@FMSMSX102.amr.corp.intel.com>

Vladimir, I think I have addressed every concern in the latest webrev:

http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/

I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations.  Adding more parameters didn't seem to be a win to get around it.
The code is fully retested with no issues.

Thanks,
Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, March 30, 2016 9:56 PM
To: Berg, Michael C <michael.c.berg at intel.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: CR for RFR 8151573

On 3/30/16 4:57 PM, Berg, Michael C wrote:
> See below for context.
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, March 30, 2016 3:51 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; 
> 'hotspot-compiler-dev at openjdk.java.net' 
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: CR for RFR 8151573
>
> Michael,
>
> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes.
>
> multi_version_post_loops() can use is_canonical_main_loop_entry() from 
> 8148754 but you need to modify it to move
> is_Main() assert to other call sites.
>
> <MCB> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different.  I can leave the name, but it will be overloaded afterward with two types of functionality.  The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different?  I will leave this one to last so that we have time to discuss this.

I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)?

>
> I did not get rce'd post loop checks in loopnode.cpp.
>
> <MCB> First I will have to explain what I am doing with do_range_check().  That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning.
> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit.  In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery.  Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical.  Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass.
> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy.  If we cannot multiversion transform the loop we added we eliminate it.

I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled.

You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new
  struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop.

Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? 
There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()).
Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop?

>
> Swap next checks since has_range_checks() may be expensive scanning loop body:
> +  // only process RCE'd main loops
> +  if (cl->has_range_checks() || !cl->is_main_loop()) return;
>
> <MCB> Ok, makes sense.

Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request.
But, please, rename has_range_checks(cl) method to avoid confusion.

Thanks,
Vladimir

>
> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded.
> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around  do_range_check() call.
>
> <MCB> I perceive the real problem is don't scan more than once after we check. I will move towards that solution.
>
>
> Why you need local copies?:
>
> -  visited.Clear();
> -  clones.clear();
> +  Arena *a = Thread::current()->resource_area();
> +  VectorSet visited(a);
> +  Node_Stack clones(a, main_head->back_control()->outcnt());
>
> <MCB> I will look into this, and see if it can be cleaned up.
>
>
> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set.
>
> <MCB> Ok, I will look into a version without PostLoopInfo.
>
> Thanks,
> Vladimir
>
> On 3/30/16 1:44 PM, Berg, Michael C wrote:
>> Here is an update after full testing, the webrev is:
>>
>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/
>>
>> Please review and comment,
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev
>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>> Berg, Michael C
>> Sent: Wednesday, March 16, 2016 10:30 AM
>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 
>> 'hotspot-compiler-dev at openjdk.java.net'
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: RE: CR for RFR 8151573
>>
>> Putting a hold on the review, retesting everything on my end.
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, March 16, 2016 8:42 AM
>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>> Subject: Re: CR for RFR 8151573
>>
>> On 3/15/16 5:29 PM, Berg, Michael C wrote:
>>> Vladimir:
>>>
>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop.
>>
>> I understand that we can get some benefits. But in general case they will not be visible.
>>
>>>
>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases?
>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping.
>>
>> Yes, after you explained me vector masking I now understand why it could be used for post loop.
>>
>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops.
>>
>> Regards,
>> Vladimir
>>
>>>
>>> Regards,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Tuesday, March 15, 2016 4:37 PM
>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>> Subject: Re: CR for RFR 8151573
>>>
>>> As we all know we can always construct microbenchmarks which shows 
>>> 30%
>>> - 50% difference. When in real application we will never see 
>>> difference. I still don't see a real reason why we should spend time 
>>> and optimize
>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost.
>>>
>>> Why "programmable SIMD" depends on it? What about pre-loop?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/15/16 4:14 PM, Berg, Michael C wrote:
>>>> Correction below...
>>>>
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev
>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>>>> Berg, Michael C
>>>> Sent: Tuesday, March 15, 2016 4:08 PM
>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net'
>>>> Subject: RE: CR for RFR 8151573
>>>>
>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this:
>>>>
>>>>         for(int i = 0; i < process_len; i++)
>>>>         {
>>>>           d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]);
>>>>         }
>>>>
>>>> The above code makes 9 vector ops.
>>>>
>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift.
>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways.
>>>> The value process_len is some fraction of the array length in my measurements.  The idea of the metrics Is to pose a post loop with a modest amount of iterations in it.  For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop.
>>>>
>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop.  We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15.  If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80.  By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations.
>>>>
>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one?
>>>>
>>>> I thought it would be easier to do them separately.  Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation.
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, March 15, 2016 2:42 PM
>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>> Subject: Re: CR for RFR 8151573
>>>>
>>>> Hi Michael,
>>>>
>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote:
>>>>> Hi Folks,
>>>>>
>>>>> I would like to contribute multi-versioning post loops for range 
>>>>> check elimination.  Beforehand cfg optimizations after register 
>>>>> allocation were where post loop optimizations were done for range 
>>>>> checks.  I have added code which produces the desired effect much 
>>>>> earlier by introducing a safe transformation which will minimally 
>>>>> allow a range check free version of the final post loop to execute 
>>>>> up until the point it actually has to take a range check exception 
>>>>> by re-ranging the limit of the rce'd loop, then exit the rce'd 
>>>>> post loop and take the range check exception in the legacy loops execution if required.
>>>>> If during optimization we discover that we know enough to remove 
>>>>> the range check version of the post loop, mostly by exposing the 
>>>>> load range values into the limit logic of the rce'd post loop, we 
>>>>> will eliminate the range check post loop altogether much like cfg 
>>>>> optimizations did, but much earlier.  This gives optimizations 
>>>>> like programmable SIMD (via SuperWord) the opportunity to 
>>>>> vectorize the rce'd post loops to a single iteration based on mask 
>>>>> vectors which map to the residual iterations. Programmable SIMD 
>>>>> will be a follow on change set utilizing this code to stage its 
>>>>> work. This optimization also exposes the rce'd post loop without flow to other optimizations.
>>>>> Currently I have enabled this optimization for x86 only.  We base 
>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added.
>>>>>
>>>>> This code was tested as follows:
>>>>>
>>>>>
>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573
>>>>>
>>>>>
>>>>> webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Michael
>>>>>

From igor.veresov at oracle.com  Fri Apr  1 05:44:05 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 31 Mar 2016 22:44:05 -0700
Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the
	corresponding x86 hotspot instrinsic
In-Reply-To: <56FD2F47.1000709@azulsystems.com>
References: <56A751AE.9090203@azulsystems.com>
	<A3061E35-6AAA-4D5D-BE88-5C1E33B1E439@oracle.com>
	<DB160C5D-C421-4B63-89A7-286EFFF5D751@azul.com>
	<45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com>
	<56A8BC9D.8060004@azulsystems.com>
	<6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com>
	<56AA2AE4.2090803@azulsystems.com>
	<2538083C-7906-44AA-A074-7DBF5F2D8654@oracle.com>
	<50C14C66-4068-4DD7-BD94-96E37F7C9B0A@oracle.com>
	<56AF85F3.3060802@azulsystems.com>
	<A0C89858-799F-48DD-9773-581A2E372FE9@oracle.com>
	<56BBCBF4.2070504@azulsystems.com>
	<FA4DF022-B8B9-4AED-A499-1D32BBAFA74B@oracle.com>
	<56BD1F7F.3020808@azulsystems.com>
	<B10E8043-4366-480E-9665-40653E223ECF@oracle.com>
	<56E0B770.1@azulsystems.com>
	<D5CE7C53-BA21-4907-99C3-CFDC56A17329@oracle.com>
	<56FD2F47.1000709@azulsystems.com>
Message-ID: <080FF9DB-5B5C-47B7-AC1C-174755C9B826@oracle.com>

Looks good.

igor

> On Mar 31, 2016, at 7:08 AM, Ivan Krylov <ivan at azulsystems.com> wrote:
> 
> I have updated the webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/
> 
> I overlooked the missing LIR_Assembler::on_spin_wait() for non-x86 platforms. I have no access to non-intel boxes
> and hence saw the problems only at integration time.
> c1_LIRAssembler.o: In function `LIR_Assembler::emit_op0(LIR_Op0*)':
> hotspot/src/share/vm/c1/c1_LIRAssembler.cpp:683: undefined reference to `LIR_Assembler::on_spin_wait()'
> 
> So, 3 empty method implementations were added to the corresponding <cpu> files - the top 3 on the webrev above.
> 
> Paul, thanks for identifying those issues.
> 
> Regards,
> 
> Ivan
> 
> 
> 
> 
> On 10/03/2016 04:04, Igor Veresov wrote:
>> Ok, good.
>> 
>> igor
>> 
>>> On Mar 9, 2016, at 3:53 PM, Ivan Krylov <ivan at azulsystems.com> wrote:
>>> 
>>> Paul, Indeed, thanks. I have modified the test.
>>> I also made changes to reflect the fact that onSpinWait is now decided to be placed into j.l.Thread.
>>> 
>>> Igor,
>>> This is a new webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/
>>> This is the diff between previous and this patches (03 vs 04):
>>> http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/diff.txt
>>> 
>>> Thanks,
>>> 
>>> Ivan
>>> 
>>> On 12/02/2016 06:01, Paul Sandoz wrote:
>>>>> On 12 Feb 2016, at 00:55, Ivan Krylov <ivan at azulsystems.com> wrote:
>>>>> 
>>>>> Hi Igor,
>>>>> 
>>>>> Thanks both for your help and your reviews.
>>>>> Here is a new version, tested on mac for c1 and c2:
>>>>> 
>>>>> http://cr.openjdk.java.net/~ikrylov/8147844.hs.03
>>>>> 
>>>> Now that support C1 is supported should the test be updated with C1 only execution?
>>>> 
>>>> Paul.
> 


From rahul.v.raghavan at oracle.com  Fri Apr  1 08:22:59 2016
From: rahul.v.raghavan at oracle.com (Rahul Raghavan)
Date: Fri, 1 Apr 2016 01:22:59 -0700 (PDT)
Subject: RFR (XXS): 8150690: C++11 user-defined literal syntax in
	jvmciCompilerToVM.cpp
In-Reply-To: <84F04E81-69FB-402F-956B-1C9CD21AD4C2@oracle.com>
References: <99c44ac7-0279-421f-9469-9f5445d1312a@default>
	<84F04E81-69FB-402F-956B-1C9CD21AD4C2@oracle.com>
Message-ID: <00b78f82-b68b-4c42-867f-438efccaa3ba@default>

> -----Original Message-----
> From: Christian Thalinger > Sent: Friday, April 01, 2016 5:01 AM
> 
> Looks correct.

Thank you Chris.

> 
> > On Mar 30, 2016, at 6:21 PM, Rahul Raghavan <rahul.v.raghavan at oracle.com> wrote:
> >
> > Hi,
> >
> > <bug>: https://bugs.openjdk.java.net/browse/JDK-8150690
> > <webrev>: http://cr.openjdk.java.net/~rraghavan/8150690/webrev.00/
> >
> > - Added space required between literal and identifier for C++11, in CompilerToVM::methods array initializer.
> > (only white space changes)
> > - Confirmed no other similar issues elsewhere in jvmciCompilerToVM.cpp.
> > - This proposed fix is similar to fix done for JDK-8081202, JDK-8135209, JDK-8132969.
> >
> > - Could not try and reconfirm with Visual Studio 2015.
> > But manually confirmed the changes and
> > understood another related infrastructure/build task is reported separately - JDK-8145549 (to build OpenJDK using Visual Studio
> 2015 Community edition)
> >
> > - No issues with jprt run (-testset hotspot).
> >
> > Thanks,
> > Rahul
> 

From jamsheed.c.m at oracle.com  Fri Apr  1 09:02:04 2016
From: jamsheed.c.m at oracle.com (Jamsheed C m)
Date: Fri, 1 Apr 2016 14:32:04 +0530
Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...)
	failed: a) MT-unsafe modification of inline cache
In-Reply-To: <56F9521E.5020808@oracle.com>
References: <56F456B7.6010104@oracle.com>
	<3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com>
	<56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com>
	<56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com>
	<56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com>
	<56F9521E.5020808@oracle.com>
Message-ID: <56FE390C.20906@oracle.com>

Hi Vladimir Ivanov,

I used overloaded clearInlineCaches wb api.

revised webrevs:
hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/
root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/

Best Regards,
Jamsheed

On 3/28/2016 9:17 PM, Vladimir Ivanov wrote:
>>> in addition it clears this.
>>> void static_stub_Relocation::clear_inline_cache() {
>>>   // Call stub is only used when calling the interpreted code.
>>>   // It does not really need to be cleared, except that we want to
>>> clean out the methodoop.
>>>   CompiledStaticCall::set_stub_to_clean(this);
>>
>> i want assert to catch this issue. if static stubs are cleared, assert
>> wouldn't fail.
> I see. Then I suggest to rename the method to 
> WhiteBox.cleanupInlineCaches() and iterate over the whole code cache 
> (don't specify Method*).
>
> void CodeCache::cleanup_inline_caches() {
>   assert_locked_or_safepoint(CodeCache_lock);
>   NMethodIterator iter;
>   while(iter.next_alive()) {
>     iter.method()->cleanup_inline_caches(true);
>   }
> }
>
> Best regards,
> Vladimir Ivanov
>
>
>> -Jamsheed
>>> }
>>>
>>> Best Regards,
>>> Jamsheed
>>>>
>>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb))
>>>>   VM_ClearICs clear_ics;
>>>>   VMThread::execute(&clear_ics);
>>>> WB_END
>>>>
>>>> class VM_ClearICs: public VM_Operation {
>>>> ...
>>>>   void doit()         { CodeCache::clear_inline_caches(); }
>>>> ...
>>>> };
>>>>
>>>> void CodeCache::clear_inline_caches() {
>>>>   assert_locked_or_safepoint(CodeCache_lock);
>>>>   NMethodIterator iter;
>>>>   while(iter.next_alive()) {
>>>>     iter.method()->clear_inline_caches();
>>>>   }
>>>> }
>>>>
>>>> void nmethod::clear_inline_caches() {
>>>>   assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's
>>>> only allowed at safepoint");
>>>>   if (is_zombie()) {
>>>>     return;
>>>>   }
>>>>
>>>>   RelocIterator iter(this);
>>>>   while (iter.next()) {
>>>>     iter.reloc()->clear_inline_cache();
>>>>   }
>>>> }
>>>>
>>>> void static_call_Relocation::clear_inline_cache() {
>>>>   // Safe call site info
>>>>   CompiledStaticCall* handler = compiledStaticCall_at(this);
>>>>   handler->set_to_clean();
>>>> }
>>>>
>>>> void opt_virtual_call_Relocation::clear_inline_cache() {
>>>>   // No stubs for ICs
>>>>   // Clean IC
>>>>   ResourceMark rm;
>>>>   CompiledIC* icache = CompiledIC_at(this);
>>>>   icache->set_to_clean();
>>>> }
>>>>
>>>> void virtual_call_Relocation::clear_inline_cache() {
>>>>   // No stubs for ICs
>>>>   // Clean IC
>>>>   ResourceMark rm;
>>>>   CompiledIC* icache = CompiledIC_at(this);
>>>>   icache->set_to_clean();
>>>> }
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>>
>>>>> Best Regards,
>>>>> Jamsheed
>>>>>
>>>>> On 3/26/2016 1:50 PM, Dean Long wrote:
>>>>>> Instead of changing cleanup_inline_caches() to take a new flag, can
>>>>>> you use the existing
>>>>>> clear_inline_caches()?
>>>>>>
>>>>>> dl
>>>>>>
>>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote:
>>>>>>> Thank you Chris.
>>>>>>> I have updated the code.
>>>>>>>
>>>>>>> + if (method == NULL) {
>>>>>>> + return;
>>>>>>> + }
>>>>>>> + nmethod* nm = method->code();
>>>>>>> + if (nm == NULL || nm->is_unloaded()) {
>>>>>>> + return;
>>>>>>> + }
>>>>>>> + nm->cleanup_inline_caches(true);
>>>>>>> Best Regards,
>>>>>>> Jamsheed
>>>>>>>
>>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote:
>>>>>>>>
>>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m
>>>>>>>>> <jamsheed.c.m at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Request for review,
>>>>>>>>>
>>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247
>>>>>>>>>
>>>>>>>>> webrevs:
>>>>>>>>> fix:
>>>>>>>>>   jdk part: 
>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/
>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.jdk.00/>
>>>>>>>>>
>>>>>>>>> newly added test case
>>>>>>>>> hotspot part:
>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/
>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.hs.00/>
>>>>>>>>> under hs-comp/test
>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/
>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.hs_comp.00/>
>>>>>>>>>
>>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java
>>>>>>>>> Testing: JPRT with new test case, with fix, without fix
>>>>>>>>>
>>>>>>>>> Problem Summary:  MH.invoke linksite take assistance of java code
>>>>>>>>> to get an adapter method. Here a new method holder class and a
>>>>>>>>> adapter method are created for a MT and lform instance is cached.
>>>>>>>>> Normally this cached lform get returned for a linksite request of
>>>>>>>>> same MT.  When these cached lform get collected(due to memory
>>>>>>>>> pressure),  a new class and method gets created for same MT(even
>>>>>>>>> though old method holder class and  adapter method are live).
>>>>>>>>> Fix Summary: Kept a strong reference to lform instance in adapter
>>>>>>>>> method holder class of  MT.
>>>>>>>>
>>>>>>>> Wow!  You found the cause for his long-standing issue? Nice.
>>>>>>>> + if (method == NULL) { return; }
>>>>>>>> + nmethod* nm = method->code();
>>>>>>>> + if (nm == NULL) { return; }
>>>>>>>> + if (nm->is_unloaded()) { return; }
>>>>>>>> Please put the return and } on separate lines.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Jamsheed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>


From igor.ignatyev at oracle.com  Fri Apr  1 09:22:42 2016
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 1 Apr 2016 12:22:42 +0300
Subject: RFR(S): 8151828: Jittester: array creation node handled
	inproperly in source code visitor for non-int numerical arrays
In-Reply-To: <56FC242C.6030108@oracle.com>
References: <56FC242C.6030108@oracle.com>
Message-ID: <3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com>

Hi Dmitrij,

the fix looks good to me

Thanks,
? Igor
> On Mar 30, 2016, at 10:08 PM, Dmitrij Pochepko <dmitrij.pochepko at oracle.com> wrote:
> 
> Hi,
> 
> please review small fix for 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays
> 
> A problem was in Arrays.fill method usage with mismatched argument types for primitive types arrays, so, generated tests compilation failed.
> 
> This fix removes respective Arrays.fill usage generation for primitive types.
> 
> bug: https://bugs.openjdk.java.net/browse/JDK-8151828
> webrev: http://cr.openjdk.java.net/~dpochepk/8151828/webrev.01/
> 
> I've tested fix locally.
> 
> Thanks,
> Dmitrij
> 
> 


From vladimir.x.ivanov at oracle.com  Fri Apr  1 09:48:53 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 1 Apr 2016 12:48:53 +0300
Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...)
	failed: a) MT-unsafe modification of inline cache
In-Reply-To: <56FE390C.20906@oracle.com>
References: <56F456B7.6010104@oracle.com>
	<3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com>
	<56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com>
	<56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com>
	<56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com>
	<56F9521E.5020808@oracle.com> <56FE390C.20906@oracle.com>
Message-ID: <56FE4405.3080807@oracle.com>

Looks good!

Small detail: the following comment in the test is misleading:

   71         test();  // new LF creation should fail.

LF shouldn't be unloaded, so no new LF is normally not instantiated.

Something like the following:
   // Trigger call site re-resolution. Invoker LambdaForm should stay 
the same.
   test();

No need to send new webrev.

Best regards,
Vladimir Ivanov

On 4/1/16 12:02 PM, Jamsheed C m wrote:
> Hi Vladimir Ivanov,
>
> I used overloaded clearInlineCaches wb api.
>
> revised webrevs:
> hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/
> root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/
>
> Best Regards,
> Jamsheed
>
> On 3/28/2016 9:17 PM, Vladimir Ivanov wrote:
>>>> in addition it clears this.
>>>> void static_stub_Relocation::clear_inline_cache() {
>>>>   // Call stub is only used when calling the interpreted code.
>>>>   // It does not really need to be cleared, except that we want to
>>>> clean out the methodoop.
>>>>   CompiledStaticCall::set_stub_to_clean(this);
>>>
>>> i want assert to catch this issue. if static stubs are cleared, assert
>>> wouldn't fail.
>> I see. Then I suggest to rename the method to
>> WhiteBox.cleanupInlineCaches() and iterate over the whole code cache
>> (don't specify Method*).
>>
>> void CodeCache::cleanup_inline_caches() {
>>   assert_locked_or_safepoint(CodeCache_lock);
>>   NMethodIterator iter;
>>   while(iter.next_alive()) {
>>     iter.method()->cleanup_inline_caches(true);
>>   }
>> }
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>
>>> -Jamsheed
>>>> }
>>>>
>>>> Best Regards,
>>>> Jamsheed
>>>>>
>>>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb))
>>>>>   VM_ClearICs clear_ics;
>>>>>   VMThread::execute(&clear_ics);
>>>>> WB_END
>>>>>
>>>>> class VM_ClearICs: public VM_Operation {
>>>>> ...
>>>>>   void doit()         { CodeCache::clear_inline_caches(); }
>>>>> ...
>>>>> };
>>>>>
>>>>> void CodeCache::clear_inline_caches() {
>>>>>   assert_locked_or_safepoint(CodeCache_lock);
>>>>>   NMethodIterator iter;
>>>>>   while(iter.next_alive()) {
>>>>>     iter.method()->clear_inline_caches();
>>>>>   }
>>>>> }
>>>>>
>>>>> void nmethod::clear_inline_caches() {
>>>>>   assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's
>>>>> only allowed at safepoint");
>>>>>   if (is_zombie()) {
>>>>>     return;
>>>>>   }
>>>>>
>>>>>   RelocIterator iter(this);
>>>>>   while (iter.next()) {
>>>>>     iter.reloc()->clear_inline_cache();
>>>>>   }
>>>>> }
>>>>>
>>>>> void static_call_Relocation::clear_inline_cache() {
>>>>>   // Safe call site info
>>>>>   CompiledStaticCall* handler = compiledStaticCall_at(this);
>>>>>   handler->set_to_clean();
>>>>> }
>>>>>
>>>>> void opt_virtual_call_Relocation::clear_inline_cache() {
>>>>>   // No stubs for ICs
>>>>>   // Clean IC
>>>>>   ResourceMark rm;
>>>>>   CompiledIC* icache = CompiledIC_at(this);
>>>>>   icache->set_to_clean();
>>>>> }
>>>>>
>>>>> void virtual_call_Relocation::clear_inline_cache() {
>>>>>   // No stubs for ICs
>>>>>   // Clean IC
>>>>>   ResourceMark rm;
>>>>>   CompiledIC* icache = CompiledIC_at(this);
>>>>>   icache->set_to_clean();
>>>>> }
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Jamsheed
>>>>>>
>>>>>> On 3/26/2016 1:50 PM, Dean Long wrote:
>>>>>>> Instead of changing cleanup_inline_caches() to take a new flag, can
>>>>>>> you use the existing
>>>>>>> clear_inline_caches()?
>>>>>>>
>>>>>>> dl
>>>>>>>
>>>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote:
>>>>>>>> Thank you Chris.
>>>>>>>> I have updated the code.
>>>>>>>>
>>>>>>>> + if (method == NULL) {
>>>>>>>> + return;
>>>>>>>> + }
>>>>>>>> + nmethod* nm = method->code();
>>>>>>>> + if (nm == NULL || nm->is_unloaded()) {
>>>>>>>> + return;
>>>>>>>> + }
>>>>>>>> + nm->cleanup_inline_caches(true);
>>>>>>>> Best Regards,
>>>>>>>> Jamsheed
>>>>>>>>
>>>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote:
>>>>>>>>>
>>>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m
>>>>>>>>>> <jamsheed.c.m at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> Request for review,
>>>>>>>>>>
>>>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247
>>>>>>>>>>
>>>>>>>>>> webrevs:
>>>>>>>>>> fix:
>>>>>>>>>>   jdk part:
>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.jdk.00/>
>>>>>>>>>>
>>>>>>>>>> newly added test case
>>>>>>>>>> hotspot part:
>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.hs.00/>
>>>>>>>>>> under hs-comp/test
>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.hs_comp.00/>
>>>>>>>>>>
>>>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java
>>>>>>>>>> Testing: JPRT with new test case, with fix, without fix
>>>>>>>>>>
>>>>>>>>>> Problem Summary:  MH.invoke linksite take assistance of java code
>>>>>>>>>> to get an adapter method. Here a new method holder class and a
>>>>>>>>>> adapter method are created for a MT and lform instance is cached.
>>>>>>>>>> Normally this cached lform get returned for a linksite request of
>>>>>>>>>> same MT.  When these cached lform get collected(due to memory
>>>>>>>>>> pressure),  a new class and method gets created for same MT(even
>>>>>>>>>> though old method holder class and  adapter method are live).
>>>>>>>>>> Fix Summary: Kept a strong reference to lform instance in adapter
>>>>>>>>>> method holder class of  MT.
>>>>>>>>>
>>>>>>>>> Wow!  You found the cause for his long-standing issue? Nice.
>>>>>>>>> + if (method == NULL) { return; }
>>>>>>>>> + nmethod* nm = method->code();
>>>>>>>>> + if (nm == NULL) { return; }
>>>>>>>>> + if (nm->is_unloaded()) { return; }
>>>>>>>>> Please put the return and } on separate lines.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Jamsheed
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>

From jamsheed.c.m at oracle.com  Fri Apr  1 10:05:49 2016
From: jamsheed.c.m at oracle.com (Jamsheed C m)
Date: Fri, 1 Apr 2016 15:35:49 +0530
Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...)
	failed: a) MT-unsafe modification of inline cache
In-Reply-To: <56FE4405.3080807@oracle.com>
References: <56F456B7.6010104@oracle.com>
	<3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com>
	<56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com>
	<56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com>
	<56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com>
	<56F9521E.5020808@oracle.com> <56FE390C.20906@oracle.com>
	<56FE4405.3080807@oracle.com>
Message-ID: <56FE47FD.2010905@oracle.com>

Sure. Thank you Vladimir Ivanov!

Best Regards,
Jamsheed

On 4/1/2016 3:18 PM, Vladimir Ivanov wrote:
> Looks good!
>
> Small detail: the following comment in the test is misleading:
>
>   71         test();  // new LF creation should fail.
>
> LF shouldn't be unloaded, so no new LF is normally not instantiated.
>
> Something like the following:
>   // Trigger call site re-resolution. Invoker LambdaForm should stay 
> the same.
>   test();
>
> No need to send new webrev.
>
> Best regards,
> Vladimir Ivanov
>
> On 4/1/16 12:02 PM, Jamsheed C m wrote:
>> Hi Vladimir Ivanov,
>>
>> I used overloaded clearInlineCaches wb api.
>>
>> revised webrevs:
>> hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/
>> root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/
>>
>> Best Regards,
>> Jamsheed
>>
>> On 3/28/2016 9:17 PM, Vladimir Ivanov wrote:
>>>>> in addition it clears this.
>>>>> void static_stub_Relocation::clear_inline_cache() {
>>>>>   // Call stub is only used when calling the interpreted code.
>>>>>   // It does not really need to be cleared, except that we want to
>>>>> clean out the methodoop.
>>>>>   CompiledStaticCall::set_stub_to_clean(this);
>>>>
>>>> i want assert to catch this issue. if static stubs are cleared, assert
>>>> wouldn't fail.
>>> I see. Then I suggest to rename the method to
>>> WhiteBox.cleanupInlineCaches() and iterate over the whole code cache
>>> (don't specify Method*).
>>>
>>> void CodeCache::cleanup_inline_caches() {
>>>   assert_locked_or_safepoint(CodeCache_lock);
>>>   NMethodIterator iter;
>>>   while(iter.next_alive()) {
>>>     iter.method()->cleanup_inline_caches(true);
>>>   }
>>> }
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>
>>>> -Jamsheed
>>>>> }
>>>>>
>>>>> Best Regards,
>>>>> Jamsheed
>>>>>>
>>>>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb))
>>>>>>   VM_ClearICs clear_ics;
>>>>>>   VMThread::execute(&clear_ics);
>>>>>> WB_END
>>>>>>
>>>>>> class VM_ClearICs: public VM_Operation {
>>>>>> ...
>>>>>>   void doit()         { CodeCache::clear_inline_caches(); }
>>>>>> ...
>>>>>> };
>>>>>>
>>>>>> void CodeCache::clear_inline_caches() {
>>>>>>   assert_locked_or_safepoint(CodeCache_lock);
>>>>>>   NMethodIterator iter;
>>>>>>   while(iter.next_alive()) {
>>>>>>     iter.method()->clear_inline_caches();
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> void nmethod::clear_inline_caches() {
>>>>>>   assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's
>>>>>> only allowed at safepoint");
>>>>>>   if (is_zombie()) {
>>>>>>     return;
>>>>>>   }
>>>>>>
>>>>>>   RelocIterator iter(this);
>>>>>>   while (iter.next()) {
>>>>>>     iter.reloc()->clear_inline_cache();
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> void static_call_Relocation::clear_inline_cache() {
>>>>>>   // Safe call site info
>>>>>>   CompiledStaticCall* handler = compiledStaticCall_at(this);
>>>>>>   handler->set_to_clean();
>>>>>> }
>>>>>>
>>>>>> void opt_virtual_call_Relocation::clear_inline_cache() {
>>>>>>   // No stubs for ICs
>>>>>>   // Clean IC
>>>>>>   ResourceMark rm;
>>>>>>   CompiledIC* icache = CompiledIC_at(this);
>>>>>>   icache->set_to_clean();
>>>>>> }
>>>>>>
>>>>>> void virtual_call_Relocation::clear_inline_cache() {
>>>>>>   // No stubs for ICs
>>>>>>   // Clean IC
>>>>>>   ResourceMark rm;
>>>>>>   CompiledIC* icache = CompiledIC_at(this);
>>>>>>   icache->set_to_clean();
>>>>>> }
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Jamsheed
>>>>>>>
>>>>>>> On 3/26/2016 1:50 PM, Dean Long wrote:
>>>>>>>> Instead of changing cleanup_inline_caches() to take a new flag, 
>>>>>>>> can
>>>>>>>> you use the existing
>>>>>>>> clear_inline_caches()?
>>>>>>>>
>>>>>>>> dl
>>>>>>>>
>>>>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote:
>>>>>>>>> Thank you Chris.
>>>>>>>>> I have updated the code.
>>>>>>>>>
>>>>>>>>> + if (method == NULL) {
>>>>>>>>> + return;
>>>>>>>>> + }
>>>>>>>>> + nmethod* nm = method->code();
>>>>>>>>> + if (nm == NULL || nm->is_unloaded()) {
>>>>>>>>> + return;
>>>>>>>>> + }
>>>>>>>>> + nm->cleanup_inline_caches(true);
>>>>>>>>> Best Regards,
>>>>>>>>> Jamsheed
>>>>>>>>>
>>>>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote:
>>>>>>>>>>
>>>>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m
>>>>>>>>>>> <jamsheed.c.m at oracle.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> Request for review,
>>>>>>>>>>>
>>>>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247
>>>>>>>>>>>
>>>>>>>>>>> webrevs:
>>>>>>>>>>> fix:
>>>>>>>>>>>   jdk part:
>>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.jdk.00/>
>>>>>>>>>>>
>>>>>>>>>>> newly added test case
>>>>>>>>>>> hotspot part:
>>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.hs.00/>
>>>>>>>>>>> under hs-comp/test
>>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Ejcm/8067247/webrev.hs_comp.00/>
>>>>>>>>>>>
>>>>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java
>>>>>>>>>>> Testing: JPRT with new test case, with fix, without fix
>>>>>>>>>>>
>>>>>>>>>>> Problem Summary:  MH.invoke linksite take assistance of java 
>>>>>>>>>>> code
>>>>>>>>>>> to get an adapter method. Here a new method holder class and a
>>>>>>>>>>> adapter method are created for a MT and lform instance is 
>>>>>>>>>>> cached.
>>>>>>>>>>> Normally this cached lform get returned for a linksite 
>>>>>>>>>>> request of
>>>>>>>>>>> same MT.  When these cached lform get collected(due to memory
>>>>>>>>>>> pressure),  a new class and method gets created for same 
>>>>>>>>>>> MT(even
>>>>>>>>>>> though old method holder class and  adapter method are live).
>>>>>>>>>>> Fix Summary: Kept a strong reference to lform instance in 
>>>>>>>>>>> adapter
>>>>>>>>>>> method holder class of  MT.
>>>>>>>>>>
>>>>>>>>>> Wow!  You found the cause for his long-standing issue? Nice.
>>>>>>>>>> + if (method == NULL) { return; }
>>>>>>>>>> + nmethod* nm = method->code();
>>>>>>>>>> + if (nm == NULL) { return; }
>>>>>>>>>> + if (nm->is_unloaded()) { return; }
>>>>>>>>>> Please put the return and } on separate lines.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Jamsheed
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>


From aleksey.shipilev at oracle.com  Fri Apr  1 11:35:28 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 1 Apr 2016 14:35:28 +0300
Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not
	assume asserts are benign
Message-ID: <56FE5D00.7000209@oracle.com>

Hi,

compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify
String Concat strategies, because some of them are loading new methods
and use them during String concat linkage and execution. Notably, this
will happen inside of the asserts. We need to prime the asserts before
using them in-between counter polls.

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8153265

Webrev:
  http://cr.openjdk.java.net/~shade/8153265/webrev.00/

Testing: offending test in oob/-Xcomp modes

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160401/82fd04d3/signature.asc>

From dmitrij.pochepko at oracle.com  Fri Apr  1 12:27:10 2016
From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko)
Date: Fri, 1 Apr 2016 15:27:10 +0300
Subject: RFR(S): 8151828: Jittester: array creation node handled
	inproperly in source code visitor for non-int numerical arrays
In-Reply-To: <3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com>
References: <56FC242C.6030108@oracle.com>
	<3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com>
Message-ID: <56FE691E.2050208@oracle.com>

Thank you!
> Hi Dmitrij,
>
> the fix looks good to me
>
> Thanks,
> ? Igor
>> On Mar 30, 2016, at 10:08 PM, Dmitrij Pochepko <dmitrij.pochepko at oracle.com> wrote:
>>
>> Hi,
>>
>> please review small fix for 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays
>>
>> A problem was in Arrays.fill method usage with mismatched argument types for primitive types arrays, so, generated tests compilation failed.
>>
>> This fix removes respective Arrays.fill usage generation for primitive types.
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8151828
>> webrev: http://cr.openjdk.java.net/~dpochepk/8151828/webrev.01/
>>
>> I've tested fix locally.
>>
>> Thanks,
>> Dmitrij
>>
>>


From zoltan.majo at oracle.com  Fri Apr  1 12:32:01 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Fri, 1 Apr 2016 14:32:01 +0200
Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop
	optimizations to 'develop'
In-Reply-To: <56FD4906.4040909@oracle.com>
References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com>
	<56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com>
Message-ID: <56FE6A41.6030703@oracle.com>

Hi Vladimir,


thank you for the feedback!

On 03/31/2016 05:57 PM, Vladimir Kozlov wrote:
> It is nice to have not product flags which is easy to remove :)

Yes, indeed. :-)

>
> Clean up looks good.

Thank you.

>
> Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions 
> -XX:-LoopLimitCheck" only? It has interesting code shape. Add comment 
> that it was ran with "-XX:+UnlockDiagnosticVMOptions 
> -XX:-LoopLimitCheck" to trigger problem.

Yes, of course. Here is the updated webrev:
http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/

Thank you!

Best regards,


Zoltan

>
> Thanks,
> Vladimir
>
> On 3/31/16 1:15 AM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> thank you for your feedback!
>>
>> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote:
>>> These flags were added when I fixed long standing C2 problem with 
>>> counted loops: 5091921.
>>> They were added to have ability to revert back to original code if 
>>> new code cause a problem.
>>> Looks like the old code which executed with these flags switched off 
>>> become rotten.
>>>
>>> Zoltan, did you find what cause the crash? Looks like product VM was 
>>> used in the bug report. What result gives
>>> fastdebug VM?
>>
>> I've tried starting different VM versions with the flag(s) off. The 
>> most frequent error I get is
>>
>> # Internal Error 
>> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), 
>> pid=32727, tid=32746
>> # assert(false) failed: Bad graph detected in build_loop_late
>>
>> So it seems that the code executed with the flags off has indeed 
>> become rotten.
>>
>>> Converting flags to develop will not prevent problems happening with 
>>> fastdebug VM where these flags could be switched
>>> off even when they are develop.
>>>
>>> If the problem with original code (flags are off) is something 
>>> fundamental we may simple remove old code and remove
>>> these flags and have only new code. 5 years already passed since 
>>> 5091921 was fixed.
>>
>> Yes, I agree. I think it's reasonable to remove the old code.
>>
>> Here is the new webrev:
>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/
>>
>> The changes pass JPRT.
>>
>> I've changed the title of the bug to "Cleanup: Remove some unused 
>> flags/code in loop optimizations" to better reflect
>> what the change is doing. I have kept the original title in the RFR.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote:
>>>> Hi,
>>>>
>>>>
>>>> please review the patch for 8072422.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8072422
>>>>
>>>> Problem: Some flags controlling loop optimizations are currently 
>>>> 'diagnostic'. Even though these flags are useful
>>>> mostly for compiler-related development, their value can be changed 
>>>> not only in
>>>> fastdebug, but also also in release builds,
>>>>
>>>> Solution: Change the flags to 'develop'.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/
>>>>
>>>> Testing:
>>>> - locally built/started VM;
>>>> - locally executed 
>>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java.
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>


From martin.doerr at sap.com  Fri Apr  1 12:37:30 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 1 Apr 2016 12:37:30 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
Message-ID: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>

Hello everyone,

we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms.

The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking.
Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.)

I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/

Please review. I will also need a sponsor.

Best regards,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160401/39454b55/attachment-0001.html>

From nils.eliasson at oracle.com  Fri Apr  1 13:55:01 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 1 Apr 2016 15:55:01 +0200
Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to
	compile method
Message-ID: <56FE7DB5.404@oracle.com>

Hi all,

Please review this fix.

Summary:
There is a mismatch in the CompilerWhiteBox testcases between the 
callable and the executable constructors. SimpleTestCase$Helper 
implements all constructors and methods that are tested. However since 
Helper is an inner class there will be an extra (javac created) 
constructor that has the parent class as an appended argument. The 
callable will invoke this constructor, but the executable will reference 
the normal constructor.

Solution:
Stop have the Helper as an inner class. Rename it to 
SimpleTestCaseHelper for some uniqueness in compiler commands and 
directives.

Testing:
Run all hotspot/compiler/whitebox tests on all platforms, and all 
hotspot/compiler tests on one platform.

Bug: https://bugs.openjdk.java.net/browse/JDK-8151880
Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/

Best regards,
Nils Eliasson

From aleksey.shipilev at oracle.com  Fri Apr  1 14:37:54 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 1 Apr 2016 17:37:54 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <56F5676D.7020401@oracle.com>
References: <56F5676D.7020401@oracle.com>
Message-ID: <56FE87C2.50002@oracle.com>

On 03/25/2016 07:29 PM, Aleksey Shipilev wrote:
> I would like to solicit comments for C1 support for new
> Unsafe.compareAndExchange intrinsics (we have support for them in C2).
> The rest of new Unsafe methods that are not intrinsified by C1 are
> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be
> emulated with existing APIs.
> 
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8152753
> 
> Webrev:
>   http://cr.openjdk.java.net/~shade/8152753/webrev.00/

Update:
  http://cr.openjdk.java.net/~shade/8152753/webrev.01/

Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some
other cleanups.

Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT
hs-comp testset (some unrelated timeouts on SPARC).

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160401/91d6764c/signature.asc>

From christian.thalinger at oracle.com  Fri Apr  1 16:33:35 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 1 Apr 2016 06:33:35 -1000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
Message-ID: <E546C97E-4A4F-4A15-B989-A7320A106508@oracle.com>


> On Apr 1, 2016, at 2:37 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Hello everyone,
>  
> we have found a concurrency problem with the nmethod?s exception cache. Readers of the cache may read stale data on weak memory platforms.
>  
> The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking.
> Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.)
>  
> I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice.
>  
> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ <http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/>
Does it make sense to keep:
   void set_exception_cache(ExceptionCache *ec)    { _exception_cache = ec; }
or would it be safer to always do the store-release even when clearing the cache?

>  
> Please review. I will also need a sponsor.
>  
> Best regards,
> Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160401/5c1cf812/attachment.html>

From aph at redhat.com  Fri Apr  1 16:42:38 2016
From: aph at redhat.com (Andrew Haley)
Date: Fri, 1 Apr 2016 17:42:38 +0100
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
Message-ID: <56FEA4FE.2010807@redhat.com>

On 04/01/2016 01:37 PM, Doerr, Martin wrote:

> Therefore, the nmethod's field _exception_cache needs to be volatile
> and adding new entries must be done by releasing stores. (Loading
> seems to be fine without acquire because there's an address
> dependency from the load of the cache to the usage of its contents
> which is sufficient to ensure ordering on all openjdk platforms.)

I think that's very risky.  We can't be really sure what an optimizer
might do in this area, as discussed at (very) considerable length in
concurrency forums.  memory_order_consume does this correctly in C++11
but we're not yet using C++11.  I'd use acquire and leave a note that
in future this can be replaced by memory_order_consume.

Andrew.

From igor.veresov at oracle.com  Fri Apr  1 18:28:35 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Fri, 1 Apr 2016 11:28:35 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
Message-ID: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>

When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).

JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/

Thanks,
igor

From tom.rodriguez at oracle.com  Fri Apr  1 19:47:25 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 1 Apr 2016 12:47:25 -0700
Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should
	return dependencies_failed
Message-ID: <EFA1C857-34E1-4131-AF1A-FEF515A7D6FB@oracle.com>

http://cr.openjdk.java.net/~never/8153315/webrev

This fixes a minor issue which showed up while debugging Java code.  evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies.  Graal considers it an error to build invalid dependencies so it complained.  Tested under the Eclipse debugger.

tom

From igor.veresov at oracle.com  Fri Apr  1 20:18:48 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Fri, 1 Apr 2016 13:18:48 -0700
Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should
	return dependencies_failed
In-Reply-To: <EFA1C857-34E1-4131-AF1A-FEF515A7D6FB@oracle.com>
References: <EFA1C857-34E1-4131-AF1A-FEF515A7D6FB@oracle.com>
Message-ID: <110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com>

Looks good.

igor

> On Apr 1, 2016, at 12:47 PM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~never/8153315/webrev
> 
> This fixes a minor issue which showed up while debugging Java code.  evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies.  Graal considers it an error to build invalid dependencies so it complained.  Tested under the Eclipse debugger.
> 
> tom


From michael.c.berg at intel.com  Fri Apr  1 21:51:14 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Fri, 1 Apr 2016 21:51:14 +0000
Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler
Message-ID: <C568518E7B433348B114B6A7122D474756E4FEAB@FMSMSX102.amr.corp.intel.com>

Hi All,

I would like to contribute some clean up on the x86 assembler applied to vex encoding to address the usage of the nds assembler parameter.
For all instructions which use nds source xmm registers, the validity check has been removed.  It was originally placed there here:

http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.1269

And propagated.  Now nds register usage is fully compliant with each isa descrption.

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001
webrev:
http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/

Thanks,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160401/e42a9de2/attachment.html>

From vladimir.kozlov at oracle.com  Sat Apr  2 03:18:28 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 1 Apr 2016 20:18:28 -0700
Subject: CR for RFR 8151573
In-Reply-To: <C568518E7B433348B114B6A7122D474756E4FBFA@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E4C64D@FMSMSX102.amr.corp.intel.com>
	<56E881A9.7070004@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C6A9@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4C6DB@FMSMSX102.amr.corp.intel.com>
	<56E89CA4.8010201@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C719@FMSMSX102.amr.corp.intel.com>
	<56E97EC5.6030608@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C8E8@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4F529@FMSMSX102.amr.corp.intel.com>
	<56FC5852.2030101@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4F60F@FMSMSX102.amr.corp.intel.com>
	<56FCADE3.20403@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4FBFA@FMSMSX102.amr.corp.intel.com>
Message-ID: <56FF3A04.5090601@oracle.com>

I start preintegration testing.

Thanks,
Vladimir

On 3/31/16 8:36 PM, Berg, Michael C wrote:
> Vladimir, I think I have addressed every concern in the latest webrev:
>
> http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/
>
> I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations.  Adding more parameters didn't seem to be a win to get around it.
> The code is fully retested with no issues.
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, March 30, 2016 9:56 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: CR for RFR 8151573
>
> On 3/30/16 4:57 PM, Berg, Michael C wrote:
>> See below for context.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, March 30, 2016 3:51 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>;
>> 'hotspot-compiler-dev at openjdk.java.net'
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: CR for RFR 8151573
>>
>> Michael,
>>
>> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes.
>>
>> multi_version_post_loops() can use is_canonical_main_loop_entry() from
>> 8148754 but you need to modify it to move
>> is_Main() assert to other call sites.
>>
>> <MCB> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different.  I can leave the name, but it will be overloaded afterward with two types of functionality.  The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different?  I will leave this one to last so that we have time to discuss this.
>
> I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)?
>
>>
>> I did not get rce'd post loop checks in loopnode.cpp.
>>
>> <MCB> First I will have to explain what I am doing with do_range_check().  That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning.
>> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit.  In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery.  Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical.  Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass.
>> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy.  If we cannot multiversion transform the loop we added we eliminate it.
>
> I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled.
>
> You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new
>    struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop.
>
> Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right?
> There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()).
> Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop?
>
>>
>> Swap next checks since has_range_checks() may be expensive scanning loop body:
>> +  // only process RCE'd main loops
>> +  if (cl->has_range_checks() || !cl->is_main_loop()) return;
>>
>> <MCB> Ok, makes sense.
>
> Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request.
> But, please, rename has_range_checks(cl) method to avoid confusion.
>
> Thanks,
> Vladimir
>
>>
>> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded.
>> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around  do_range_check() call.
>>
>> <MCB> I perceive the real problem is don't scan more than once after we check. I will move towards that solution.
>>
>>
>> Why you need local copies?:
>>
>> -  visited.Clear();
>> -  clones.clear();
>> +  Arena *a = Thread::current()->resource_area();
>> +  VectorSet visited(a);
>> +  Node_Stack clones(a, main_head->back_control()->outcnt());
>>
>> <MCB> I will look into this, and see if it can be cleaned up.
>>
>>
>> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set.
>>
>> <MCB> Ok, I will look into a version without PostLoopInfo.
>>
>> Thanks,
>> Vladimir
>>
>> On 3/30/16 1:44 PM, Berg, Michael C wrote:
>>> Here is an update after full testing, the webrev is:
>>>
>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/
>>>
>>> Please review and comment,
>>>
>>> Thanks,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev
>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
>>> Berg, Michael C
>>> Sent: Wednesday, March 16, 2016 10:30 AM
>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>;
>>> 'hotspot-compiler-dev at openjdk.java.net'
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: RE: CR for RFR 8151573
>>>
>>> Putting a hold on the review, retesting everything on my end.
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, March 16, 2016 8:42 AM
>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>> Subject: Re: CR for RFR 8151573
>>>
>>> On 3/15/16 5:29 PM, Berg, Michael C wrote:
>>>> Vladimir:
>>>>
>>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop.
>>>
>>> I understand that we can get some benefits. But in general case they will not be visible.
>>>
>>>>
>>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases?
>>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping.
>>>
>>> Yes, after you explained me vector masking I now understand why it could be used for post loop.
>>>
>>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops.
>>>
>>> Regards,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, March 15, 2016 4:37 PM
>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>> Subject: Re: CR for RFR 8151573
>>>>
>>>> As we all know we can always construct microbenchmarks which shows
>>>> 30%
>>>> - 50% difference. When in real application we will never see
>>>> difference. I still don't see a real reason why we should spend time
>>>> and optimize
>>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost.
>>>>
>>>> Why "programmable SIMD" depends on it? What about pre-loop?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/15/16 4:14 PM, Berg, Michael C wrote:
>>>>> Correction below...
>>>>>
>>>>> -----Original Message-----
>>>>> From: hotspot-compiler-dev
>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
>>>>> Berg, Michael C
>>>>> Sent: Tuesday, March 15, 2016 4:08 PM
>>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: RE: CR for RFR 8151573
>>>>>
>>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this:
>>>>>
>>>>>          for(int i = 0; i < process_len; i++)
>>>>>          {
>>>>>            d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]);
>>>>>          }
>>>>>
>>>>> The above code makes 9 vector ops.
>>>>>
>>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift.
>>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways.
>>>>> The value process_len is some fraction of the array length in my measurements.  The idea of the metrics Is to pose a post loop with a modest amount of iterations in it.  For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop.
>>>>>
>>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop.  We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15.  If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80.  By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations.
>>>>>
>>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one?
>>>>>
>>>>> I thought it would be easier to do them separately.  Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation.
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, March 15, 2016 2:42 PM
>>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: Re: CR for RFR 8151573
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote:
>>>>>> Hi Folks,
>>>>>>
>>>>>> I would like to contribute multi-versioning post loops for range
>>>>>> check elimination.  Beforehand cfg optimizations after register
>>>>>> allocation were where post loop optimizations were done for range
>>>>>> checks.  I have added code which produces the desired effect much
>>>>>> earlier by introducing a safe transformation which will minimally
>>>>>> allow a range check free version of the final post loop to execute
>>>>>> up until the point it actually has to take a range check exception
>>>>>> by re-ranging the limit of the rce'd loop, then exit the rce'd
>>>>>> post loop and take the range check exception in the legacy loops execution if required.
>>>>>> If during optimization we discover that we know enough to remove
>>>>>> the range check version of the post loop, mostly by exposing the
>>>>>> load range values into the limit logic of the rce'd post loop, we
>>>>>> will eliminate the range check post loop altogether much like cfg
>>>>>> optimizations did, but much earlier.  This gives optimizations
>>>>>> like programmable SIMD (via SuperWord) the opportunity to
>>>>>> vectorize the rce'd post loops to a single iteration based on mask
>>>>>> vectors which map to the residual iterations. Programmable SIMD
>>>>>> will be a follow on change set utilizing this code to stage its
>>>>>> work. This optimization also exposes the rce'd post loop without flow to other optimizations.
>>>>>> Currently I have enabled this optimization for x86 only.  We base
>>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added.
>>>>>>
>>>>>> This code was tested as follows:
>>>>>>
>>>>>>
>>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573
>>>>>>
>>>>>>
>>>>>> webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Michael
>>>>>>

From michael.c.berg at intel.com  Sat Apr  2 03:25:16 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Sat, 2 Apr 2016 03:25:16 +0000
Subject: CR for RFR 8151573
In-Reply-To: <56FF3A04.5090601@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E4C64D@FMSMSX102.amr.corp.intel.com>
	<56E881A9.7070004@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C6A9@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4C6DB@FMSMSX102.amr.corp.intel.com>
	<56E89CA4.8010201@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C719@FMSMSX102.amr.corp.intel.com>
	<56E97EC5.6030608@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C8E8@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4F529@FMSMSX102.amr.corp.intel.com>
	<56FC5852.2030101@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4F60F@FMSMSX102.amr.corp.intel.com>
	<56FCADE3.20403@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4FBFA@FMSMSX102.amr.corp.intel.com>
	<56FF3A04.5090601@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E4FFCA@FMSMSX102.amr.corp.intel.com>

I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude.

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Friday, April 01, 2016 8:18 PM
To: Berg, Michael C <michael.c.berg at intel.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: CR for RFR 8151573

I start preintegration testing.

Thanks,
Vladimir

On 3/31/16 8:36 PM, Berg, Michael C wrote:
> Vladimir, I think I have addressed every concern in the latest webrev:
>
> http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/
>
> I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations.  Adding more parameters didn't seem to be a win to get around it.
> The code is fully retested with no issues.
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, March 30, 2016 9:56 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; 
> 'hotspot-compiler-dev at openjdk.java.net' 
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: CR for RFR 8151573
>
> On 3/30/16 4:57 PM, Berg, Michael C wrote:
>> See below for context.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, March 30, 2016 3:51 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>; 
>> 'hotspot-compiler-dev at openjdk.java.net'
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: CR for RFR 8151573
>>
>> Michael,
>>
>> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes.
>>
>> multi_version_post_loops() can use is_canonical_main_loop_entry() 
>> from
>> 8148754 but you need to modify it to move
>> is_Main() assert to other call sites.
>>
>> <MCB> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different.  I can leave the name, but it will be overloaded afterward with two types of functionality.  The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different?  I will leave this one to last so that we have time to discuss this.
>
> I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)?
>
>>
>> I did not get rce'd post loop checks in loopnode.cpp.
>>
>> <MCB> First I will have to explain what I am doing with do_range_check().  That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning.
>> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit.  In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery.  Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical.  Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass.
>> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy.  If we cannot multiversion transform the loop we added we eliminate it.
>
> I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled.
>
> You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new
>    struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop.
>
> Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right?
> There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()).
> Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop?
>
>>
>> Swap next checks since has_range_checks() may be expensive scanning loop body:
>> +  // only process RCE'd main loops
>> +  if (cl->has_range_checks() || !cl->is_main_loop()) return;
>>
>> <MCB> Ok, makes sense.
>
> Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request.
> But, please, rename has_range_checks(cl) method to avoid confusion.
>
> Thanks,
> Vladimir
>
>>
>> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded.
>> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around  do_range_check() call.
>>
>> <MCB> I perceive the real problem is don't scan more than once after we check. I will move towards that solution.
>>
>>
>> Why you need local copies?:
>>
>> -  visited.Clear();
>> -  clones.clear();
>> +  Arena *a = Thread::current()->resource_area();
>> +  VectorSet visited(a);
>> +  Node_Stack clones(a, main_head->back_control()->outcnt());
>>
>> <MCB> I will look into this, and see if it can be cleaned up.
>>
>>
>> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set.
>>
>> <MCB> Ok, I will look into a version without PostLoopInfo.
>>
>> Thanks,
>> Vladimir
>>
>> On 3/30/16 1:44 PM, Berg, Michael C wrote:
>>> Here is an update after full testing, the webrev is:
>>>
>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/
>>>
>>> Please review and comment,
>>>
>>> Thanks,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev
>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>>> Berg, Michael C
>>> Sent: Wednesday, March 16, 2016 10:30 AM
>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 
>>> 'hotspot-compiler-dev at openjdk.java.net'
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: RE: CR for RFR 8151573
>>>
>>> Putting a hold on the review, retesting everything on my end.
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, March 16, 2016 8:42 AM
>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>> Subject: Re: CR for RFR 8151573
>>>
>>> On 3/15/16 5:29 PM, Berg, Michael C wrote:
>>>> Vladimir:
>>>>
>>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop.
>>>
>>> I understand that we can get some benefits. But in general case they will not be visible.
>>>
>>>>
>>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases?
>>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping.
>>>
>>> Yes, after you explained me vector masking I now understand why it could be used for post loop.
>>>
>>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops.
>>>
>>> Regards,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, March 15, 2016 4:37 PM
>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>> Subject: Re: CR for RFR 8151573
>>>>
>>>> As we all know we can always construct microbenchmarks which shows 
>>>> 30%
>>>> - 50% difference. When in real application we will never see 
>>>> difference. I still don't see a real reason why we should spend 
>>>> time and optimize
>>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost.
>>>>
>>>> Why "programmable SIMD" depends on it? What about pre-loop?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/15/16 4:14 PM, Berg, Michael C wrote:
>>>>> Correction below...
>>>>>
>>>>> -----Original Message-----
>>>>> From: hotspot-compiler-dev
>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf 
>>>>> Of Berg, Michael C
>>>>> Sent: Tuesday, March 15, 2016 4:08 PM
>>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: RE: CR for RFR 8151573
>>>>>
>>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this:
>>>>>
>>>>>          for(int i = 0; i < process_len; i++)
>>>>>          {
>>>>>            d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]);
>>>>>          }
>>>>>
>>>>> The above code makes 9 vector ops.
>>>>>
>>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift.
>>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways.
>>>>> The value process_len is some fraction of the array length in my measurements.  The idea of the metrics Is to pose a post loop with a modest amount of iterations in it.  For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop.
>>>>>
>>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop.  We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15.  If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80.  By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations.
>>>>>
>>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one?
>>>>>
>>>>> I thought it would be easier to do them separately.  Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation.
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, March 15, 2016 2:42 PM
>>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: Re: CR for RFR 8151573
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote:
>>>>>> Hi Folks,
>>>>>>
>>>>>> I would like to contribute multi-versioning post loops for range 
>>>>>> check elimination.  Beforehand cfg optimizations after register 
>>>>>> allocation were where post loop optimizations were done for range 
>>>>>> checks.  I have added code which produces the desired effect much 
>>>>>> earlier by introducing a safe transformation which will minimally 
>>>>>> allow a range check free version of the final post loop to 
>>>>>> execute up until the point it actually has to take a range check 
>>>>>> exception by re-ranging the limit of the rce'd loop, then exit 
>>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required.
>>>>>> If during optimization we discover that we know enough to remove 
>>>>>> the range check version of the post loop, mostly by exposing the 
>>>>>> load range values into the limit logic of the rce'd post loop, we 
>>>>>> will eliminate the range check post loop altogether much like cfg 
>>>>>> optimizations did, but much earlier.  This gives optimizations 
>>>>>> like programmable SIMD (via SuperWord) the opportunity to 
>>>>>> vectorize the rce'd post loops to a single iteration based on 
>>>>>> mask vectors which map to the residual iterations. Programmable 
>>>>>> SIMD will be a follow on change set utilizing this code to stage 
>>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations.
>>>>>> Currently I have enabled this optimization for x86 only.  We base 
>>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added.
>>>>>>
>>>>>> This code was tested as follows:
>>>>>>
>>>>>>
>>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573
>>>>>>
>>>>>>
>>>>>> webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Michael
>>>>>>

From michael.c.berg at intel.com  Sat Apr  2 05:16:01 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Sat, 2 Apr 2016 05:16:01 +0000
Subject: CR for RFR 8151573
In-Reply-To: <C568518E7B433348B114B6A7122D474756E4FFCA@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E4C64D@FMSMSX102.amr.corp.intel.com>
	<56E881A9.7070004@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C6A9@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4C6DB@FMSMSX102.amr.corp.intel.com>
	<56E89CA4.8010201@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C719@FMSMSX102.amr.corp.intel.com>
	<56E97EC5.6030608@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C8E8@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4F529@FMSMSX102.amr.corp.intel.com>
	<56FC5852.2030101@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4F60F@FMSMSX102.amr.corp.intel.com>
	<56FCADE3.20403@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4FBFA@FMSMSX102.amr.corp.intel.com>
	<56FF3A04.5090601@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4FFCA@FMSMSX102.amr.corp.intel.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E50046@FMSMSX102.amr.corp.intel.com>

That small revision is reflected in:

https://bugs.openjdk.java.net/browse/JDK-8151573

and can be accessed at:

http://cr.openjdk.java.net/~mcberg/8151573/webrev.04a/ 

Regards,
Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
Sent: Friday, April 01, 2016 8:25 PM
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: RE: CR for RFR 8151573

I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude.

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Friday, April 01, 2016 8:18 PM
To: Berg, Michael C <michael.c.berg at intel.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: CR for RFR 8151573

I start preintegration testing.

Thanks,
Vladimir

On 3/31/16 8:36 PM, Berg, Michael C wrote:
> Vladimir, I think I have addressed every concern in the latest webrev:
>
> http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/
>
> I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations.  Adding more parameters didn't seem to be a win to get around it.
> The code is fully retested with no issues.
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, March 30, 2016 9:56 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; 
> 'hotspot-compiler-dev at openjdk.java.net'
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: CR for RFR 8151573
>
> On 3/30/16 4:57 PM, Berg, Michael C wrote:
>> See below for context.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, March 30, 2016 3:51 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>; 
>> 'hotspot-compiler-dev at openjdk.java.net'
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: CR for RFR 8151573
>>
>> Michael,
>>
>> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes.
>>
>> multi_version_post_loops() can use is_canonical_main_loop_entry() 
>> from
>> 8148754 but you need to modify it to move
>> is_Main() assert to other call sites.
>>
>> <MCB> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different.  I can leave the name, but it will be overloaded afterward with two types of functionality.  The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different?  I will leave this one to last so that we have time to discuss this.
>
> I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)?
>
>>
>> I did not get rce'd post loop checks in loopnode.cpp.
>>
>> <MCB> First I will have to explain what I am doing with do_range_check().  That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning.
>> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit.  In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery.  Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical.  Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass.
>> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy.  If we cannot multiversion transform the loop we added we eliminate it.
>
> I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled.
>
> You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new
>    struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop.
>
> Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right?
> There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()).
> Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop?
>
>>
>> Swap next checks since has_range_checks() may be expensive scanning loop body:
>> +  // only process RCE'd main loops
>> +  if (cl->has_range_checks() || !cl->is_main_loop()) return;
>>
>> <MCB> Ok, makes sense.
>
> Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request.
> But, please, rename has_range_checks(cl) method to avoid confusion.
>
> Thanks,
> Vladimir
>
>>
>> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded.
>> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around  do_range_check() call.
>>
>> <MCB> I perceive the real problem is don't scan more than once after we check. I will move towards that solution.
>>
>>
>> Why you need local copies?:
>>
>> -  visited.Clear();
>> -  clones.clear();
>> +  Arena *a = Thread::current()->resource_area();
>> +  VectorSet visited(a);
>> +  Node_Stack clones(a, main_head->back_control()->outcnt());
>>
>> <MCB> I will look into this, and see if it can be cleaned up.
>>
>>
>> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set.
>>
>> <MCB> Ok, I will look into a version without PostLoopInfo.
>>
>> Thanks,
>> Vladimir
>>
>> On 3/30/16 1:44 PM, Berg, Michael C wrote:
>>> Here is an update after full testing, the webrev is:
>>>
>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/
>>>
>>> Please review and comment,
>>>
>>> Thanks,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev
>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>>> Berg, Michael C
>>> Sent: Wednesday, March 16, 2016 10:30 AM
>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 
>>> 'hotspot-compiler-dev at openjdk.java.net'
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: RE: CR for RFR 8151573
>>>
>>> Putting a hold on the review, retesting everything on my end.
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, March 16, 2016 8:42 AM
>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>> Subject: Re: CR for RFR 8151573
>>>
>>> On 3/15/16 5:29 PM, Berg, Michael C wrote:
>>>> Vladimir:
>>>>
>>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop.
>>>
>>> I understand that we can get some benefits. But in general case they will not be visible.
>>>
>>>>
>>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases?
>>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping.
>>>
>>> Yes, after you explained me vector masking I now understand why it could be used for post loop.
>>>
>>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops.
>>>
>>> Regards,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, March 15, 2016 4:37 PM
>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>> Subject: Re: CR for RFR 8151573
>>>>
>>>> As we all know we can always construct microbenchmarks which shows 
>>>> 30%
>>>> - 50% difference. When in real application we will never see 
>>>> difference. I still don't see a real reason why we should spend 
>>>> time and optimize
>>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost.
>>>>
>>>> Why "programmable SIMD" depends on it? What about pre-loop?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/15/16 4:14 PM, Berg, Michael C wrote:
>>>>> Correction below...
>>>>>
>>>>> -----Original Message-----
>>>>> From: hotspot-compiler-dev
>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf 
>>>>> Of Berg, Michael C
>>>>> Sent: Tuesday, March 15, 2016 4:08 PM
>>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: RE: CR for RFR 8151573
>>>>>
>>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this:
>>>>>
>>>>>          for(int i = 0; i < process_len; i++)
>>>>>          {
>>>>>            d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]);
>>>>>          }
>>>>>
>>>>> The above code makes 9 vector ops.
>>>>>
>>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift.
>>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways.
>>>>> The value process_len is some fraction of the array length in my measurements.  The idea of the metrics Is to pose a post loop with a modest amount of iterations in it.  For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop.
>>>>>
>>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop.  We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15.  If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80.  By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations.
>>>>>
>>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one?
>>>>>
>>>>> I thought it would be easier to do them separately.  Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation.
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, March 15, 2016 2:42 PM
>>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: Re: CR for RFR 8151573
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote:
>>>>>> Hi Folks,
>>>>>>
>>>>>> I would like to contribute multi-versioning post loops for range 
>>>>>> check elimination.  Beforehand cfg optimizations after register 
>>>>>> allocation were where post loop optimizations were done for range 
>>>>>> checks.  I have added code which produces the desired effect much 
>>>>>> earlier by introducing a safe transformation which will minimally 
>>>>>> allow a range check free version of the final post loop to 
>>>>>> execute up until the point it actually has to take a range check 
>>>>>> exception by re-ranging the limit of the rce'd loop, then exit 
>>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required.
>>>>>> If during optimization we discover that we know enough to remove 
>>>>>> the range check version of the post loop, mostly by exposing the 
>>>>>> load range values into the limit logic of the rce'd post loop, we 
>>>>>> will eliminate the range check post loop altogether much like cfg 
>>>>>> optimizations did, but much earlier.  This gives optimizations 
>>>>>> like programmable SIMD (via SuperWord) the opportunity to 
>>>>>> vectorize the rce'd post loops to a single iteration based on 
>>>>>> mask vectors which map to the residual iterations. Programmable 
>>>>>> SIMD will be a follow on change set utilizing this code to stage 
>>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations.
>>>>>> Currently I have enabled this optimization for x86 only.  We base 
>>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added.
>>>>>>
>>>>>> This code was tested as follows:
>>>>>>
>>>>>>
>>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573
>>>>>>
>>>>>>
>>>>>> webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Michael
>>>>>>

From jamsheed.c.m at oracle.com  Mon Apr  4 06:14:14 2016
From: jamsheed.c.m at oracle.com (Jamsheed C m)
Date: Mon, 4 Apr 2016 11:44:14 +0530
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
Message-ID: <57020636.7010806@oracle.com>

Hi Martin,

"nmethod's exception cache not multi-thread safe"  bug is fixed in b107
bug id: https://bugs.openjdk.java.net/browse/JDK-8143897
fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9
discussion link: 
http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html

Best Regards,
Jamsheed

On 4/1/2016 6:07 PM, Doerr, Martin wrote:
>
> Hello everyone,
>
> we have found a concurrency problem with the nmethod?s exception 
> cache. Readers of the cache may read stale data on weak memory platforms.
>
> The writers of the cache are synchronized by locks, but there may be 
> concurrent readers: The compiler runtimes use 
> nmethod::handler_for_exception_and_pc to access the cache without locking.
>
> Therefore, the nmethod's field _exception_cache needs to be volatile 
> and adding new entries must be done by releasing stores. (Loading 
> seems to be fine without acquire because there's an address dependency 
> from the load of the cache to the usage of its contents which is 
> sufficient to ensure ordering on all openjdk platforms.)
>
> I also added a minor cleanup: I changed nmethod::is_alive to read the 
> volatile field _state only once. It is certainly undesired to force 
> the compiler to load it from memory twice.
>
> Webrev is here:
>
> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/
>
> Please review. I will also need a sponsor.
>
> Best regards,
>
> Martin
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160404/3151bb27/attachment-0001.html>

From rahul.v.raghavan at oracle.com  Mon Apr  4 08:09:08 2016
From: rahul.v.raghavan at oracle.com (Rahul Raghavan)
Date: Mon, 4 Apr 2016 01:09:08 -0700 (PDT)
Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in
	regmask.cpp
In-Reply-To: <56FD74F2.2080102@oracle.com>
References: <f9892734-a3d1-4bce-b9c5-f7df762ba922@default>
	<56FC2A4B.5030905@oracle.com>
	<b8802940-b4cd-4c22-b9cd-a2045c5cafc5@default>
	<d9e683a3-662d-4da9-a9fc-2f7f667f0687@default>
	<C568518E7B433348B114B6A7122D474756E4F914@FMSMSX102.amr.corp.intel.com>
	<56FD74F2.2080102@oracle.com>
Message-ID: <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default>

Hi,

Please review the revised fix for JDK- 8149488.

<webrev.02>: http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ 

Based on further checking and thanks to clarifications from Michael, 
it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp',
(and that earlier mentioned case of extending bitsInByte table size to 512, is not required).

Points from Michael for the record - "
   > I believe Dean is right, I have debugged this and analyzed the usage model, 
   > we never made use of the upper components 
   > and register allocation has been right for VecZ for a good deal of time.
   > 
   > All we need for a change is,
   > Regmask.cpp:
   > 
   > uint RegMask::Size() const {
   >    extern uint8_t bitsInByte[256];
   >
   > A one line change.
   > 
   > -Michael.
   > 
   > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change.  
   > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX 
   > where we make use of VecZ and the upper bank of registers."

So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256.

Confirmed no issues with 'JPRT -testset hotspot' run.

Thanks,
Rahul

> -----Original Message-----
> From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM
> 
> Michael, isn't the correct size for this table 256?  I missed how VecZ
> relates to the table size.
> 
> dl
> 
> On 3/31/2016 9:58 AM, Berg, Michael C wrote:
> > Up until now we have gotten along with the size constraint only.
> > Let us have both the size and the table though for completeness.
> > I think we can leave the name though.
> >
> > -Michael
> >
> > -----Original Message-----
> > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com]
> > Sent: Thursday, March 31, 2016 9:18 AM
> > To: Dean Long <dean.long at oracle.com>; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C <michael.c.berg at intel.com>
> > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp
> >
> > Hi Michael,
> >
> > With respect to below thread, request help with some questions.
> > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size.
> > Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on
> targets that support it.
> > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here.
> > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future?
> >
> > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay?
> > (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in
> RegMask::Size())
> >
> >           ----- src/share/vm/libadt/vectset.hpp
> >           +#define BITS_IN_BYTE_ARRAY_SIZE 256
> >           +
> >
> >           ----- src/share/vm/opto/regmask.cpp
> >           -  extern uint8_t bitsInByte[512];
> >           +  extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE];
> >
> >           ----- src/share/vm/libadt/vectset.cpp
> >           -uint8_t bitsInByte[256] = {
> >           +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = {
> >
> > I can send revised webrev for above if all okay. Please tell me if I am missing something.
> >
> >
> > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ?
> > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]')
> >
> > Thanks,
> > Rahul
> >
> >> -----Original Message-----
> >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM
> >>
> >>> -----Original Message-----
> >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM
> >>>
> >>> When do we access elements 256 .. 511?  Wouldn't that mean we have
> >>> 9-bit bytes?
> >> Got your point Dean, Thanks.
> >> I too got some questions here now; will check and reply soon.
> >>
> >> -Rahul
> >>
> >>> dl
> >>>
> >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote:
> >>>> Hi,
> >>>>
> >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488.
> >>>>
> >>>> <bug>: https://bugs.openjdk.java.net/browse/JDK-8149488
> >>>> <webrev.01>:
> >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/
> >>>>
> >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512.
> >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512.
> >>>> Confirmed no issues with 'JPRT -testset hotspot' run.
> >>>>
> >>>> Thanks,
> >>>> Rahul
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent:
> >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler-
> >>> dev at openjdk.java.net
> >>>>> Should we not extend:
> >>>>>
> >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp:
> >>>>>    uint8_t bitsInByte[256] = { // ...
> >>>>>
> >>>>> to 512
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov'
> >>>>>
> >>>>> So how do we intend to map a VecZ register without 512 bits?
> >>>>>
> >>>>> -Michael
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: hotspot-compiler-dev
> >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf
> >>>>> Of Vladimir Ivanov
> >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan;
> >>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>>
> >>>>> Rahul,
> >>>>>
> >>>>> Can we define a constant instead and use it in both places?
> >>>>>
> >>>>> Best regards,
> >>>>> Vladimir Ivanov
> >>>>>
> >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> Please review the patch for JDK- 8149488.
> >>>>>>
> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488
> >>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/
> >>>>>>
> >>>>>> Corrected the bitsInByte array size in declaration.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Rahul
> >>>>>>
> 

From zoltan.majo at oracle.com  Mon Apr  4 10:49:40 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Mon, 4 Apr 2016 12:49:40 +0200
Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop
	optimizations to 'develop'
In-Reply-To: <56FE6A41.6030703@oracle.com>
References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com>
	<56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com>
	<56FE6A41.6030703@oracle.com>
Message-ID: <570246C4.7050504@oracle.com>

Thank you, Vladimir and Chris, for the reviews! For the record: I'll 
push the latest webrev (webrev.03) today.

Best regards,


Zoltan

On 04/01/2016 02:32 PM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> thank you for the feedback!
>
> On 03/31/2016 05:57 PM, Vladimir Kozlov wrote:
>> It is nice to have not product flags which is easy to remove :)
>
> Yes, indeed. :-)
>
>>
>> Clean up looks good.
>
> Thank you.
>
>>
>> Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions 
>> -XX:-LoopLimitCheck" only? It has interesting code shape. Add comment 
>> that it was ran with "-XX:+UnlockDiagnosticVMOptions 
>> -XX:-LoopLimitCheck" to trigger problem.
>
> Yes, of course. Here is the updated webrev:
> http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>> On 3/31/16 1:15 AM, Zolt?n Maj? wrote:
>>> Hi Vladimir,
>>>
>>>
>>> thank you for your feedback!
>>>
>>> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote:
>>>> These flags were added when I fixed long standing C2 problem with 
>>>> counted loops: 5091921.
>>>> They were added to have ability to revert back to original code if 
>>>> new code cause a problem.
>>>> Looks like the old code which executed with these flags switched 
>>>> off become rotten.
>>>>
>>>> Zoltan, did you find what cause the crash? Looks like product VM 
>>>> was used in the bug report. What result gives
>>>> fastdebug VM?
>>>
>>> I've tried starting different VM versions with the flag(s) off. The 
>>> most frequent error I get is
>>>
>>> # Internal Error 
>>> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), 
>>> pid=32727, tid=32746
>>> # assert(false) failed: Bad graph detected in build_loop_late
>>>
>>> So it seems that the code executed with the flags off has indeed 
>>> become rotten.
>>>
>>>> Converting flags to develop will not prevent problems happening 
>>>> with fastdebug VM where these flags could be switched
>>>> off even when they are develop.
>>>>
>>>> If the problem with original code (flags are off) is something 
>>>> fundamental we may simple remove old code and remove
>>>> these flags and have only new code. 5 years already passed since 
>>>> 5091921 was fixed.
>>>
>>> Yes, I agree. I think it's reasonable to remove the old code.
>>>
>>> Here is the new webrev:
>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/
>>>
>>> The changes pass JPRT.
>>>
>>> I've changed the title of the bug to "Cleanup: Remove some unused 
>>> flags/code in loop optimizations" to better reflect
>>> what the change is doing. I have kept the original title in the RFR.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> please review the patch for 8072422.
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8072422
>>>>>
>>>>> Problem: Some flags controlling loop optimizations are currently 
>>>>> 'diagnostic'. Even though these flags are useful
>>>>> mostly for compiler-related development, their value can be 
>>>>> changed not only in
>>>>> fastdebug, but also also in release builds,
>>>>>
>>>>> Solution: Change the flags to 'develop'.
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/
>>>>>
>>>>> Testing:
>>>>> - locally built/started VM;
>>>>> - locally executed 
>>>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java.
>>>>>
>>>>> Thank you and best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>
>


From zoltan.majo at oracle.com  Mon Apr  4 10:50:39 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Mon, 4 Apr 2016 12:50:39 +0200
Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop
	optimizations to 'develop'
In-Reply-To: <570246C4.7050504@oracle.com>
References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com>
	<56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com>
	<56FE6A41.6030703@oracle.com> <570246C4.7050504@oracle.com>
Message-ID: <570246FF.6030206@oracle.com>

P.S.: Typo in my previous mail: I meant webrev.02 (and not webrev.03). 
Sorry.

On 04/04/2016 12:49 PM, Zolt?n Maj? wrote:
> Thank you, Vladimir and Chris, for the reviews! For the record: I'll 
> push the latest webrev (webrev.03) today.
>
> Best regards,
>
>
> Zoltan
>
> On 04/01/2016 02:32 PM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> thank you for the feedback!
>>
>> On 03/31/2016 05:57 PM, Vladimir Kozlov wrote:
>>> It is nice to have not product flags which is easy to remove :)
>>
>> Yes, indeed. :-)
>>
>>>
>>> Clean up looks good.
>>
>> Thank you.
>>
>>>
>>> Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions 
>>> -XX:-LoopLimitCheck" only? It has interesting code shape. Add 
>>> comment that it was ran with "-XX:+UnlockDiagnosticVMOptions 
>>> -XX:-LoopLimitCheck" to trigger problem.
>>
>> Yes, of course. Here is the updated webrev:
>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/31/16 1:15 AM, Zolt?n Maj? wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> thank you for your feedback!
>>>>
>>>> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote:
>>>>> These flags were added when I fixed long standing C2 problem with 
>>>>> counted loops: 5091921.
>>>>> They were added to have ability to revert back to original code if 
>>>>> new code cause a problem.
>>>>> Looks like the old code which executed with these flags switched 
>>>>> off become rotten.
>>>>>
>>>>> Zoltan, did you find what cause the crash? Looks like product VM 
>>>>> was used in the bug report. What result gives
>>>>> fastdebug VM?
>>>>
>>>> I've tried starting different VM versions with the flag(s) off. The 
>>>> most frequent error I get is
>>>>
>>>> # Internal Error 
>>>> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), 
>>>> pid=32727, tid=32746
>>>> # assert(false) failed: Bad graph detected in build_loop_late
>>>>
>>>> So it seems that the code executed with the flags off has indeed 
>>>> become rotten.
>>>>
>>>>> Converting flags to develop will not prevent problems happening 
>>>>> with fastdebug VM where these flags could be switched
>>>>> off even when they are develop.
>>>>>
>>>>> If the problem with original code (flags are off) is something 
>>>>> fundamental we may simple remove old code and remove
>>>>> these flags and have only new code. 5 years already passed since 
>>>>> 5091921 was fixed.
>>>>
>>>> Yes, I agree. I think it's reasonable to remove the old code.
>>>>
>>>> Here is the new webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/
>>>>
>>>> The changes pass JPRT.
>>>>
>>>> I've changed the title of the bug to "Cleanup: Remove some unused 
>>>> flags/code in loop optimizations" to better reflect
>>>> what the change is doing. I have kept the original title in the RFR.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> please review the patch for 8072422.
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8072422
>>>>>>
>>>>>> Problem: Some flags controlling loop optimizations are currently 
>>>>>> 'diagnostic'. Even though these flags are useful
>>>>>> mostly for compiler-related development, their value can be 
>>>>>> changed not only in
>>>>>> fastdebug, but also also in release builds,
>>>>>>
>>>>>> Solution: Change the flags to 'develop'.
>>>>>>
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/
>>>>>>
>>>>>> Testing:
>>>>>> - locally built/started VM;
>>>>>> - locally executed 
>>>>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java.
>>>>>>
>>>>>> Thank you and best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>
>>
>


From martin.doerr at sap.com  Mon Apr  4 12:27:49 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 4 Apr 2016 12:27:49 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <E546C97E-4A4F-4A15-B989-A7320A106508@oracle.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<E546C97E-4A4F-4A15-B989-A7320A106508@oracle.com>
Message-ID: <d931f8000fcd4be9a69797721c4ee6f1@DEWDFE13DE14.global.corp.sap>

Hi Christian,

thanks for taking a look.

I had checked the other places which use set_exception_cache. They either set it to NULL or to an unmodified pre-existing object (which gets released after creation by a cumulative memory barrier after my change). Both should be ok.

I have seen many places in hotspot where we have a set_... function and a release_set_... one. So I thought this was kind of common practice.
But I don?t have a strong opinion on it.
Best regards,
Martin


From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Freitag, 1. April 2016 18:34
To: Doerr, Martin <martin.doerr at sap.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe


On Apr 1, 2016, at 2:37 AM, Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:

Hello everyone,

we have found a concurrency problem with the nmethod?s exception cache. Readers of the cache may read stale data on weak memory platforms.

The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking.
Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.)

I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/

Does it make sense to keep:

   void set_exception_cache(ExceptionCache *ec)    { _exception_cache = ec; }
or would it be safer to always do the store-release even when clearing the cache?


Please review. I will also need a sponsor.

Best regards,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160404/b15e4cae/attachment-0001.html>

From martin.doerr at sap.com  Mon Apr  4 12:57:31 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 4 Apr 2016 12:57:31 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <56FEA4FE.2010807@redhat.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<56FEA4FE.2010807@redhat.com>
Message-ID: <2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap>

Hi Andrew,

there are many places in hotspot where we rely on ordering by address dependency. That sounds feasible to me since we're not supporting Alpha processors.
The load of the ExceptionCache pointer is only used to access elements of the cache, not to establish ordering of other accesses.

I don't think compilers are allowed to break anything here because I have made the field volatile. This prevents optimizers from reordering, optimizing out or duplicating some of the loads.
All supported processors respect the ordering (due to address dependency), too, so I believe we're ok.

I'm not sure if this is what you're concerned about. Did I miss anything?

Best regards,
Martin


-----Original Message-----
From: Andrew Haley [mailto:aph at redhat.com] 
Sent: Freitag, 1. April 2016 18:43
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

On 04/01/2016 01:37 PM, Doerr, Martin wrote:

> Therefore, the nmethod's field _exception_cache needs to be volatile
> and adding new entries must be done by releasing stores. (Loading
> seems to be fine without acquire because there's an address
> dependency from the load of the cache to the usage of its contents
> which is sufficient to ensure ordering on all openjdk platforms.)

I think that's very risky.  We can't be really sure what an optimizer
might do in this area, as discussed at (very) considerable length in
concurrency forums.  memory_order_consume does this correctly in C++11
but we're not yet using C++11.  I'd use acquire and leave a note that
in future this can be replaced by memory_order_consume.

Andrew.

From aleksey.shipilev at oracle.com  Mon Apr  4 13:02:09 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Mon, 4 Apr 2016 16:02:09 +0300
Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should
	not assume asserts are benign
In-Reply-To: <56FE5D00.7000209@oracle.com>
References: <56FE5D00.7000209@oracle.com>
Message-ID: <570265D1.6040905@oracle.com>

On 04/01/2016 02:35 PM, Aleksey Shipilev wrote:
> Hi,
> 
> compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify
> String Concat strategies, because some of them are loading new methods
> and use them during String concat linkage and execution. Notably, this
> will happen inside of the asserts. We need to prime the asserts before
> using them in-between counter polls.
> 
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8153265
> 
> Webrev:
>   http://cr.openjdk.java.net/~shade/8153265/webrev.00/
> 
> Testing: offending test in oob/-Xcomp modes

Anyone?

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160404/136d4989/signature.asc>

From ivan at azulsystems.com  Mon Apr  4 13:12:20 2016
From: ivan at azulsystems.com (Ivan Krylov)
Date: Mon, 4 Apr 2016 16:12:20 +0300
Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should
	not assume asserts are benign
In-Reply-To: <570265D1.6040905@oracle.com>
References: <56FE5D00.7000209@oracle.com> <570265D1.6040905@oracle.com>
Message-ID: <57026834.6090907@azulsystems.com>

Looks right, but I am not a reviewer. Ivan

On 04/04/2016 16:02, Aleksey Shipilev wrote:
> On 04/01/2016 02:35 PM, Aleksey Shipilev wrote:
>> Hi,
>>
>> compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify
>> String Concat strategies, because some of them are loading new methods
>> and use them during String concat linkage and execution. Notably, this
>> will happen inside of the asserts. We need to prime the asserts before
>> using them in-between counter polls.
>>
>> Bug:
>>    https://bugs.openjdk.java.net/browse/JDK-8153265
>>
>> Webrev:
>>    http://cr.openjdk.java.net/~shade/8153265/webrev.00/
>>
>> Testing: offending test in oob/-Xcomp modes
> Anyone?
>
> Thanks,
> -Aleksey
>
>


From aph at redhat.com  Mon Apr  4 14:01:32 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 4 Apr 2016 15:01:32 +0100
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<56FEA4FE.2010807@redhat.com>
	<2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap>
Message-ID: <570273BC.4090605@redhat.com>

On 04/04/2016 01:57 PM, Doerr, Martin wrote:

> there are many places in hotspot where we rely on ordering by
> address dependency. That sounds feasible to me since we're not
> supporting Alpha processors.  The load of the ExceptionCache pointer
> is only used to access elements of the cache, not to establish
> ordering of other accesses.
> 
> I don't think compilers are allowed to break anything here because I
> have made the field volatile. This prevents optimizers from
> reordering, optimizing out or duplicating some of the loads.  All
> supported processors respect the ordering (due to address
> dependency), too, so I believe we're ok.

That sounds alright, at least from an informal reasoning
perspective.

> I'm not sure if this is what you're concerned about. Did I miss
> anything?

I don't think so.  I presume you've read Hans Boehm's paper where he
points out that it's very hard to rely on dependencies for memory
ordering in any high-level language [1].  For that reason I tend to
err on the side of caution when reasoning about memory.  You're
sailing a bit too close to the rocks for my comfort.  :-)

Andrew.


[1]  http://www.hboehm.info/c++mm/dependencies.html

From dean.long at oracle.com  Mon Apr  4 18:34:45 2016
From: dean.long at oracle.com (Dean Long)
Date: Mon, 4 Apr 2016 11:34:45 -0700
Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in
	regmask.cpp
In-Reply-To: <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default>
References: <f9892734-a3d1-4bce-b9c5-f7df762ba922@default>
	<56FC2A4B.5030905@oracle.com>
	<b8802940-b4cd-4c22-b9cd-a2045c5cafc5@default>
	<d9e683a3-662d-4da9-a9fc-2f7f667f0687@default>
	<C568518E7B433348B114B6A7122D474756E4F914@FMSMSX102.amr.corp.intel.com>
	<56FD74F2.2080102@oracle.com>
	<02d43f0c-f8a1-4f8c-9242-c110d78c314f@default>
Message-ID: <5702B3C5.8070507@oracle.com>

Looks OK.

dl

On 4/4/2016 1:09 AM, Rahul Raghavan wrote:
> Hi,
>
> Please review the revised fix for JDK- 8149488.
>
> <webrev.02>: http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/
>
> Based on further checking and thanks to clarifications from Michael,
> it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp',
> (and that earlier mentioned case of extending bitsInByte table size to 512, is not required).
>
> Points from Michael for the record - "
>     > I believe Dean is right, I have debugged this and analyzed the usage model,
>     > we never made use of the upper components
>     > and register allocation has been right for VecZ for a good deal of time.
>     >
>     > All we need for a change is,
>     > Regmask.cpp:
>     >
>     > uint RegMask::Size() const {
>     >    extern uint8_t bitsInByte[256];
>     >
>     > A one line change.
>     >
>     > -Michael.
>     >
>     > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change.
>     > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX
>     > where we make use of VecZ and the upper bank of registers."
>
> So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256.
>
> Confirmed no issues with 'JPRT -testset hotspot' run.
>
> Thanks,
> Rahul
>
>> -----Original Message-----
>> From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM
>>
>> Michael, isn't the correct size for this table 256?  I missed how VecZ
>> relates to the table size.
>>
>> dl
>>
>> On 3/31/2016 9:58 AM, Berg, Michael C wrote:
>>> Up until now we have gotten along with the size constraint only.
>>> Let us have both the size and the table though for completeness.
>>> I think we can leave the name though.
>>>
>>> -Michael
>>>
>>> -----Original Message-----
>>> From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com]
>>> Sent: Thursday, March 31, 2016 9:18 AM
>>> To: Dean Long <dean.long at oracle.com>; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C <michael.c.berg at intel.com>
>>> Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp
>>>
>>> Hi Michael,
>>>
>>> With respect to below thread, request help with some questions.
>>> Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size.
>>> Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on
>> targets that support it.
>>> But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here.
>>> Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future?
>>>
>>> So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay?
>>> (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in
>> RegMask::Size())
>>>            ----- src/share/vm/libadt/vectset.hpp
>>>            +#define BITS_IN_BYTE_ARRAY_SIZE 256
>>>            +
>>>
>>>            ----- src/share/vm/opto/regmask.cpp
>>>            -  extern uint8_t bitsInByte[512];
>>>            +  extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE];
>>>
>>>            ----- src/share/vm/libadt/vectset.cpp
>>>            -uint8_t bitsInByte[256] = {
>>>            +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = {
>>>
>>> I can send revised webrev for above if all okay. Please tell me if I am missing something.
>>>
>>>
>>> OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ?
>>> (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]')
>>>
>>> Thanks,
>>> Rahul
>>>
>>>> -----Original Message-----
>>>> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM
>>>>
>>>>> -----Original Message-----
>>>>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM
>>>>>
>>>>> When do we access elements 256 .. 511?  Wouldn't that mean we have
>>>>> 9-bit bytes?
>>>> Got your point Dean, Thanks.
>>>> I too got some questions here now; will check and reply soon.
>>>>
>>>> -Rahul
>>>>
>>>>> dl
>>>>>
>>>>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote:
>>>>>> Hi,
>>>>>>
>>>>>> With respect to below email thread, request help to review revised webrev.01 for 8149488.
>>>>>>
>>>>>> <bug>: https://bugs.openjdk.java.net/browse/JDK-8149488
>>>>>> <webrev.01>:
>>>>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/
>>>>>>
>>>>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512.
>>>>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512.
>>>>>> Confirmed no issues with 'JPRT -testset hotspot' run.
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent:
>>>>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler-
>>>>> dev at openjdk.java.net
>>>>>>> Should we not extend:
>>>>>>>
>>>>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp:
>>>>>>>     uint8_t bitsInByte[256] = { // ...
>>>>>>>
>>>>>>> to 512
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov'
>>>>>>>
>>>>>>> So how do we intend to map a VecZ register without 512 bits?
>>>>>>>
>>>>>>> -Michael
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: hotspot-compiler-dev
>>>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf
>>>>>>> Of Vladimir Ivanov
>>>>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan;
>>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>>>
>>>>>>> Rahul,
>>>>>>>
>>>>>>> Can we define a constant instead and use it in both places?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>>
>>>>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please review the patch for JDK- 8149488.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488
>>>>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/
>>>>>>>>
>>>>>>>> Corrected the bitsInByte array size in declaration.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rahul
>>>>>>>>


From michael.c.berg at intel.com  Mon Apr  4 20:05:05 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Mon, 4 Apr 2016 20:05:05 +0000
Subject: FW: RFR(S): 8149488: Incorrect declaration of bitsInByte in
	regmask.cpp
References: <f9892734-a3d1-4bce-b9c5-f7df762ba922@default>
	<56FC2A4B.5030905@oracle.com>
	<b8802940-b4cd-4c22-b9cd-a2045c5cafc5@default>
	<d9e683a3-662d-4da9-a9fc-2f7f667f0687@default>
	<C568518E7B433348B114B6A7122D474756E4F914@FMSMSX102.amr.corp.intel.com>
	<56FD74F2.2080102@oracle.com>
	<02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> 
Message-ID: <C568518E7B433348B114B6A7122D474756E5038D@FMSMSX102.amr.corp.intel.com>

FYI

-----Original Message-----
From: Berg, Michael C 
Sent: Monday, April 04, 2016 12:42 PM
To: 'Rahul Raghavan' <rahul.v.raghavan at oracle.com>
Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp

Looks ok Rahul.

Thanks,
Michael

-----Original Message-----
From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com]
Sent: Monday, April 04, 2016 1:09 AM
To: hotspot-compiler-dev at openjdk.java.net
Cc: Dean Long <dean.long at oracle.com>; Berg, Michael C <michael.c.berg at intel.com>; Tobias Hartmann <tobias.hartmann at oracle.com>; Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp

Hi,

Please review the revised fix for JDK- 8149488.

<webrev.02>: http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ 

Based on further checking and thanks to clarifications from Michael, it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', (and that earlier mentioned case of extending bitsInByte table size to 512, is not required).

Points from Michael for the record - "
   > I believe Dean is right, I have debugged this and analyzed the usage model, 
   > we never made use of the upper components 
   > and register allocation has been right for VecZ for a good deal of time.
   > 
   > All we need for a change is,
   > Regmask.cpp:
   > 
   > uint RegMask::Size() const {
   >    extern uint8_t bitsInByte[256];
   >
   > A one line change.
   > 
   > -Michael.
   > 
   > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change.  
   > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX 
   > where we make use of VecZ and the upper bank of registers."

So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256.

Confirmed no issues with 'JPRT -testset hotspot' run.

Thanks,
Rahul

> -----Original Message-----
> From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM
> 
> Michael, isn't the correct size for this table 256?  I missed how VecZ 
> relates to the table size.
> 
> dl
> 
> On 3/31/2016 9:58 AM, Berg, Michael C wrote:
> > Up until now we have gotten along with the size constraint only.
> > Let us have both the size and the table though for completeness.
> > I think we can leave the name though.
> >
> > -Michael
> >
> > -----Original Message-----
> > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com]
> > Sent: Thursday, March 31, 2016 9:18 AM
> > To: Dean Long <dean.long at oracle.com>; 
> > hotspot-compiler-dev at openjdk.java.net; Berg, Michael C 
> > <michael.c.berg at intel.com>
> > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in 
> > regmask.cpp
> >
> > Hi Michael,
> >
> > With respect to below thread, request help with some questions.
> > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size.
> > Also comment got was for requirement to extend bitsInByte table to
> > 512 size, for consistent mapping for VecZ register also, on
> targets that support it.
> > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here.
> > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future?
> >
> > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay?
> > (without extending current bitsInByte array contents) (Anyhow at 
> > present values above 0xFF is never indexed for bitsInByte in
> RegMask::Size())
> >
> >           ----- src/share/vm/libadt/vectset.hpp
> >           +#define BITS_IN_BYTE_ARRAY_SIZE 256
> >           +
> >
> >           ----- src/share/vm/opto/regmask.cpp
> >           -  extern uint8_t bitsInByte[512];
> >           +  extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE];
> >
> >           ----- src/share/vm/libadt/vectset.cpp
> >           -uint8_t bitsInByte[256] = {
> >           +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = {
> >
> > I can send revised webrev for above if all okay. Please tell me if I am missing something.
> >
> >
> > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ?
> > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]')
> >
> > Thanks,
> > Rahul
> >
> >> -----Original Message-----
> >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM
> >>
> >>> -----Original Message-----
> >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM
> >>>
> >>> When do we access elements 256 .. 511?  Wouldn't that mean we have 
> >>> 9-bit bytes?
> >> Got your point Dean, Thanks.
> >> I too got some questions here now; will check and reply soon.
> >>
> >> -Rahul
> >>
> >>> dl
> >>>
> >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote:
> >>>> Hi,
> >>>>
> >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488.
> >>>>
> >>>> <bug>: https://bugs.openjdk.java.net/browse/JDK-8149488
> >>>> <webrev.01>:
> >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/
> >>>>
> >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512.
> >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512.
> >>>> Confirmed no issues with 'JPRT -testset hotspot' run.
> >>>>
> >>>> Thanks,
> >>>> Rahul
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent:
> >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler-
> >>> dev at openjdk.java.net
> >>>>> Should we not extend:
> >>>>>
> >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp:
> >>>>>    uint8_t bitsInByte[256] = { // ...
> >>>>>
> >>>>> to 512
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov'
> >>>>>
> >>>>> So how do we intend to map a VecZ register without 512 bits?
> >>>>>
> >>>>> -Michael
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: hotspot-compiler-dev
> >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf 
> >>>>> Of Vladimir Ivanov
> >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; 
> >>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>>
> >>>>> Rahul,
> >>>>>
> >>>>> Can we define a constant instead and use it in both places?
> >>>>>
> >>>>> Best regards,
> >>>>> Vladimir Ivanov
> >>>>>
> >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> Please review the patch for JDK- 8149488.
> >>>>>>
> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488
> >>>>>> Webrev: 
> >>>>>> http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/
> >>>>>>
> >>>>>> Corrected the bitsInByte array size in declaration.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Rahul
> >>>>>>
> 

From doug.simon at oracle.com  Mon Apr  4 21:30:02 2016
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 4 Apr 2016 23:30:02 +0200
Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod
Message-ID: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>

The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC.

https://bugs.openjdk.java.net/browse/JDK-8153439
http://cr.openjdk.java.net/~dnsimon/8153439

From igor.veresov at oracle.com  Mon Apr  4 21:46:18 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 4 Apr 2016 14:46:18 -0700
Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an
	nmethod
In-Reply-To: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
References: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
Message-ID: <3542A78F-57C3-48D4-ADAB-923760F33EE7@oracle.com>

Looks good.

igor

> On Apr 4, 2016, at 2:30 PM, Doug Simon <doug.simon at oracle.com> wrote:
> 
> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC.
> 
> https://bugs.openjdk.java.net/browse/JDK-8153439
> http://cr.openjdk.java.net/~dnsimon/8153439


From christian.thalinger at oracle.com  Mon Apr  4 21:50:40 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 4 Apr 2016 11:50:40 -1000
Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an
	nmethod
In-Reply-To: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
References: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
Message-ID: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com>

Thanks for the quick turnaround.  Looks good.

> On Apr 4, 2016, at 11:30 AM, Doug Simon <doug.simon at oracle.com> wrote:
> 
> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC.
> 
> https://bugs.openjdk.java.net/browse/JDK-8153439
> http://cr.openjdk.java.net/~dnsimon/8153439


From christian.thalinger at oracle.com  Mon Apr  4 22:34:20 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 4 Apr 2016 12:34:20 -1000
Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an
	nmethod
In-Reply-To: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com>
References: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
	<8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com>
Message-ID: <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com>

No, not good.  We are failing a couple JVMCI tests.  Looking into it?

> On Apr 4, 2016, at 11:50 AM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
> Thanks for the quick turnaround.  Looks good.
> 
>> On Apr 4, 2016, at 11:30 AM, Doug Simon <doug.simon at oracle.com> wrote:
>> 
>> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC.
>> 
>> https://bugs.openjdk.java.net/browse/JDK-8153439
>> http://cr.openjdk.java.net/~dnsimon/8153439
> 


From vladimir.kozlov at oracle.com  Mon Apr  4 22:57:15 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 4 Apr 2016 15:57:15 -0700
Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails
	to compile method
In-Reply-To: <56FE7DB5.404@oracle.com>
References: <56FE7DB5.404@oracle.com>
Message-ID: <5702F14B.1020403@oracle.com>

2 tests have -XX:+PrintCompilation flag added. Why you need it?

Thanks,
Vladimir

On 4/1/16 6:55 AM, Nils Eliasson wrote:
> Hi all,
>
> Please review this fix.
>
> Summary:
> There is a mismatch in the CompilerWhiteBox testcases between the
> callable and the executable constructors. SimpleTestCase$Helper
> implements all constructors and methods that are tested. However since
> Helper is an inner class there will be an extra (javac created)
> constructor that has the parent class as an appended argument. The
> callable will invoke this constructor, but the executable will reference
> the normal constructor.
>
> Solution:
> Stop have the Helper as an inner class. Rename it to
> SimpleTestCaseHelper for some uniqueness in compiler commands and
> directives.
>
> Testing:
> Run all hotspot/compiler/whitebox tests on all platforms, and all
> hotspot/compiler tests on one platform.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880
> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/
>
> Best regards,
> Nils Eliasson

From vladimir.kozlov at oracle.com  Mon Apr  4 23:12:58 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 4 Apr 2016 16:12:58 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
Message-ID: <5702F4FA.3090305@oracle.com>

Looks good to me.

Thanks,
Vladimir

On 4/1/16 11:28 AM, Igor Veresov wrote:
> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>
> Thanks,
> igor
>

From igor.veresov at oracle.com  Mon Apr  4 23:17:08 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 4 Apr 2016 16:17:08 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <5702F4FA.3090305@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5702F4FA.3090305@oracle.com>
Message-ID: <77FFC56E-2B29-4D0F-8EB6-C181DFFD895D@oracle.com>

Thanks, Vladimir! Can I please get another review from the runtime team?

igor

> On Apr 4, 2016, at 4:12 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good to me.
> 
> Thanks,
> Vladimir
> 
> On 4/1/16 11:28 AM, Igor Veresov wrote:
>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>> 
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>> 
>> Thanks,
>> igor
>> 


From vladimir.kozlov at oracle.com  Mon Apr  4 23:25:02 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 4 Apr 2016 16:25:02 -0700
Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler
In-Reply-To: <C568518E7B433348B114B6A7122D474756E4FEAB@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E4FEAB@FMSMSX102.amr.corp.intel.com>
Message-ID: <5702F7CE.3040600@oracle.com>

Bug number in links is incorrect. Should be:

https://bugs.openjdk.java.net/browse/JDK-8151003
http://cr.openjdk.java.net/~mcberg/8151003/webrev.02/

Changes looks good. Very nice clean up. I will start testing.

I see you changed code for AVX > 2 in macroAssembler_x86.hpp. Is it 
because new instructions faster or to avoid mixing evex and non-evex 
instructions?

Thanks,
Vladimir

On 4/1/16 2:51 PM, Berg, Michael C wrote:
> Hi All,
>
> I would like to contribute some clean up on the x86 assembler applied to
> vex encoding to address the usage of the nds assembler parameter.
>
> For all instructions which use nds source xmm registers, the validity
> check has been removed.  It was originally placed there here:
>
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.1269
>
> And propagated.  Now nds register usage is fully compliant with each isa
> descrption.
>
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001
> webrev:
>
> http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/
>
> Thanks,
>
> Michael
>

From michael.c.berg at intel.com  Mon Apr  4 23:30:04 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Mon, 4 Apr 2016 23:30:04 +0000
Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler
In-Reply-To: <5702F7CE.3040600@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E4FEAB@FMSMSX102.amr.corp.intel.com>
	<5702F7CE.3040600@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E504DC@FMSMSX102.amr.corp.intel.com>

Before we were aliasing, which lent some ambiguity regarding AVX2 and EVEX usage, as the aliased forms had more programming via the imm field.  This way they are fully separate.

Thanks,
Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Monday, April 04, 2016 4:25 PM
To: Berg, Michael C <michael.c.berg at intel.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(M) 8151003 remove nds validity checks from vex x86 assembler

Bug number in links is incorrect. Should be:

https://bugs.openjdk.java.net/browse/JDK-8151003
http://cr.openjdk.java.net/~mcberg/8151003/webrev.02/

Changes looks good. Very nice clean up. I will start testing.

I see you changed code for AVX > 2 in macroAssembler_x86.hpp. Is it because new instructions faster or to avoid mixing evex and non-evex instructions?

Thanks,
Vladimir

On 4/1/16 2:51 PM, Berg, Michael C wrote:
> Hi All,
>
> I would like to contribute some clean up on the x86 assembler applied 
> to vex encoding to address the usage of the nds assembler parameter.
>
> For all instructions which use nds source xmm registers, the validity 
> check has been removed.  It was originally placed there here:
>
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.12
> 69
>
> And propagated.  Now nds register usage is fully compliant with each 
> isa descrption.
>
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001
> webrev:
>
> http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/
>
> Thanks,
>
> Michael
>

From michael.c.berg at intel.com  Mon Apr  4 23:32:25 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Mon, 4 Apr 2016 23:32:25 +0000
Subject: CR for RFR 8151573
In-Reply-To: <56FF3A04.5090601@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E4C64D@FMSMSX102.amr.corp.intel.com>
	<56E881A9.7070004@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C6A9@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4C6DB@FMSMSX102.amr.corp.intel.com>
	<56E89CA4.8010201@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C719@FMSMSX102.amr.corp.intel.com>
	<56E97EC5.6030608@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C8E8@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4F529@FMSMSX102.amr.corp.intel.com>
	<56FC5852.2030101@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4F60F@FMSMSX102.amr.corp.intel.com>
	<56FCADE3.20403@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4FBFA@FMSMSX102.amr.corp.intel.com>
	<56FF3A04.5090601@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E504F7@FMSMSX102.amr.corp.intel.com>

Vladimir, did you restart the integration testing after the small change I sent?

Regards,
Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Friday, April 01, 2016 8:18 PM
To: Berg, Michael C <michael.c.berg at intel.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: CR for RFR 8151573

I start preintegration testing.

Thanks,
Vladimir

On 3/31/16 8:36 PM, Berg, Michael C wrote:
> Vladimir, I think I have addressed every concern in the latest webrev:
>
> http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/
>
> I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations.  Adding more parameters didn't seem to be a win to get around it.
> The code is fully retested with no issues.
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, March 30, 2016 9:56 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; 
> 'hotspot-compiler-dev at openjdk.java.net' 
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: CR for RFR 8151573
>
> On 3/30/16 4:57 PM, Berg, Michael C wrote:
>> See below for context.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, March 30, 2016 3:51 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>; 
>> 'hotspot-compiler-dev at openjdk.java.net'
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: CR for RFR 8151573
>>
>> Michael,
>>
>> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes.
>>
>> multi_version_post_loops() can use is_canonical_main_loop_entry() 
>> from
>> 8148754 but you need to modify it to move
>> is_Main() assert to other call sites.
>>
>> <MCB> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different.  I can leave the name, but it will be overloaded afterward with two types of functionality.  The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different?  I will leave this one to last so that we have time to discuss this.
>
> I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)?
>
>>
>> I did not get rce'd post loop checks in loopnode.cpp.
>>
>> <MCB> First I will have to explain what I am doing with do_range_check().  That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning.
>> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit.  In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery.  Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical.  Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass.
>> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy.  If we cannot multiversion transform the loop we added we eliminate it.
>
> I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled.
>
> You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new
>    struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop.
>
> Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right?
> There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()).
> Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop?
>
>>
>> Swap next checks since has_range_checks() may be expensive scanning loop body:
>> +  // only process RCE'd main loops
>> +  if (cl->has_range_checks() || !cl->is_main_loop()) return;
>>
>> <MCB> Ok, makes sense.
>
> Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request.
> But, please, rename has_range_checks(cl) method to avoid confusion.
>
> Thanks,
> Vladimir
>
>>
>> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded.
>> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around  do_range_check() call.
>>
>> <MCB> I perceive the real problem is don't scan more than once after we check. I will move towards that solution.
>>
>>
>> Why you need local copies?:
>>
>> -  visited.Clear();
>> -  clones.clear();
>> +  Arena *a = Thread::current()->resource_area();
>> +  VectorSet visited(a);
>> +  Node_Stack clones(a, main_head->back_control()->outcnt());
>>
>> <MCB> I will look into this, and see if it can be cleaned up.
>>
>>
>> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set.
>>
>> <MCB> Ok, I will look into a version without PostLoopInfo.
>>
>> Thanks,
>> Vladimir
>>
>> On 3/30/16 1:44 PM, Berg, Michael C wrote:
>>> Here is an update after full testing, the webrev is:
>>>
>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/
>>>
>>> Please review and comment,
>>>
>>> Thanks,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev
>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>>> Berg, Michael C
>>> Sent: Wednesday, March 16, 2016 10:30 AM
>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 
>>> 'hotspot-compiler-dev at openjdk.java.net'
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: RE: CR for RFR 8151573
>>>
>>> Putting a hold on the review, retesting everything on my end.
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, March 16, 2016 8:42 AM
>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>> Subject: Re: CR for RFR 8151573
>>>
>>> On 3/15/16 5:29 PM, Berg, Michael C wrote:
>>>> Vladimir:
>>>>
>>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop.
>>>
>>> I understand that we can get some benefits. But in general case they will not be visible.
>>>
>>>>
>>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases?
>>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping.
>>>
>>> Yes, after you explained me vector masking I now understand why it could be used for post loop.
>>>
>>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops.
>>>
>>> Regards,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, March 15, 2016 4:37 PM
>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>> Subject: Re: CR for RFR 8151573
>>>>
>>>> As we all know we can always construct microbenchmarks which shows 
>>>> 30%
>>>> - 50% difference. When in real application we will never see 
>>>> difference. I still don't see a real reason why we should spend 
>>>> time and optimize
>>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost.
>>>>
>>>> Why "programmable SIMD" depends on it? What about pre-loop?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/15/16 4:14 PM, Berg, Michael C wrote:
>>>>> Correction below...
>>>>>
>>>>> -----Original Message-----
>>>>> From: hotspot-compiler-dev
>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf 
>>>>> Of Berg, Michael C
>>>>> Sent: Tuesday, March 15, 2016 4:08 PM
>>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: RE: CR for RFR 8151573
>>>>>
>>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this:
>>>>>
>>>>>          for(int i = 0; i < process_len; i++)
>>>>>          {
>>>>>            d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]);
>>>>>          }
>>>>>
>>>>> The above code makes 9 vector ops.
>>>>>
>>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift.
>>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways.
>>>>> The value process_len is some fraction of the array length in my measurements.  The idea of the metrics Is to pose a post loop with a modest amount of iterations in it.  For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop.
>>>>>
>>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop.  We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15.  If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80.  By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations.
>>>>>
>>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one?
>>>>>
>>>>> I thought it would be easier to do them separately.  Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation.
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, March 15, 2016 2:42 PM
>>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: Re: CR for RFR 8151573
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote:
>>>>>> Hi Folks,
>>>>>>
>>>>>> I would like to contribute multi-versioning post loops for range 
>>>>>> check elimination.  Beforehand cfg optimizations after register 
>>>>>> allocation were where post loop optimizations were done for range 
>>>>>> checks.  I have added code which produces the desired effect much 
>>>>>> earlier by introducing a safe transformation which will minimally 
>>>>>> allow a range check free version of the final post loop to 
>>>>>> execute up until the point it actually has to take a range check 
>>>>>> exception by re-ranging the limit of the rce'd loop, then exit 
>>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required.
>>>>>> If during optimization we discover that we know enough to remove 
>>>>>> the range check version of the post loop, mostly by exposing the 
>>>>>> load range values into the limit logic of the rce'd post loop, we 
>>>>>> will eliminate the range check post loop altogether much like cfg 
>>>>>> optimizations did, but much earlier.  This gives optimizations 
>>>>>> like programmable SIMD (via SuperWord) the opportunity to 
>>>>>> vectorize the rce'd post loops to a single iteration based on 
>>>>>> mask vectors which map to the residual iterations. Programmable 
>>>>>> SIMD will be a follow on change set utilizing this code to stage 
>>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations.
>>>>>> Currently I have enabled this optimization for x86 only.  We base 
>>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added.
>>>>>>
>>>>>> This code was tested as follows:
>>>>>>
>>>>>>
>>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573
>>>>>>
>>>>>>
>>>>>> webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Michael
>>>>>>

From christian.thalinger at oracle.com  Tue Apr  5 00:00:05 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 4 Apr 2016 14:00:05 -1000
Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an
	nmethod
In-Reply-To: <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com>
References: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
	<8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com>
	<3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com>
Message-ID: <F2652198-15CC-4C2E-9E92-28594F5D4F0E@oracle.com>


> On Apr 4, 2016, at 12:34 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
> No, not good.  We are failing a couple JVMCI tests.  Looking into it?

Ok, this got a little out of control but for the better:

http://cr.openjdk.java.net/~twisti/8153439/webrev.01/

The actual fix is to check for a null log argument.  The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE.  This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions.

While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone.  Again, stupid jtreg.

Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog.  I think it should.

> 
>> On Apr 4, 2016, at 11:50 AM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>> 
>> Thanks for the quick turnaround.  Looks good.
>> 
>>> On Apr 4, 2016, at 11:30 AM, Doug Simon <doug.simon at oracle.com> wrote:
>>> 
>>> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC.
>>> 
>>> https://bugs.openjdk.java.net/browse/JDK-8153439
>>> http://cr.openjdk.java.net/~dnsimon/8153439
>> 
> 


From igor.veresov at oracle.com  Tue Apr  5 00:10:41 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 4 Apr 2016 17:10:41 -0700
Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an
	nmethod
In-Reply-To: <F2652198-15CC-4C2E-9E92-28594F5D4F0E@oracle.com>
References: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
	<8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com>
	<3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com>
	<F2652198-15CC-4C2E-9E92-28594F5D4F0E@oracle.com>
Message-ID: <D9EBD90C-50C8-431B-A2BB-350139FF3DDD@oracle.com>

Seems alright.

igor

> On Apr 4, 2016, at 5:00 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
> 
>> On Apr 4, 2016, at 12:34 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>> 
>> No, not good.  We are failing a couple JVMCI tests.  Looking into it?
> 
> Ok, this got a little out of control but for the better:
> 
> http://cr.openjdk.java.net/~twisti/8153439/webrev.01/
> 
> The actual fix is to check for a null log argument.  The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE.  This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions.
> 
> While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone.  Again, stupid jtreg.
> 
> Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog.  I think it should.
> 
>> 
>>> On Apr 4, 2016, at 11:50 AM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>>> 
>>> Thanks for the quick turnaround.  Looks good.
>>> 
>>>> On Apr 4, 2016, at 11:30 AM, Doug Simon <doug.simon at oracle.com> wrote:
>>>> 
>>>> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC.
>>>> 
>>>> https://bugs.openjdk.java.net/browse/JDK-8153439
>>>> http://cr.openjdk.java.net/~dnsimon/8153439
>>> 
>> 
> 


From christian.thalinger at oracle.com  Tue Apr  5 00:24:12 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 4 Apr 2016 14:24:12 -1000
Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should
	not assume asserts are benign
In-Reply-To: <570265D1.6040905@oracle.com>
References: <56FE5D00.7000209@oracle.com> <570265D1.6040905@oracle.com>
Message-ID: <46818653-AC18-4C9F-BBEB-B1F98CC39FBA@oracle.com>

Should be alright.

> On Apr 4, 2016, at 3:02 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
> 
> On 04/01/2016 02:35 PM, Aleksey Shipilev wrote:
>> Hi,
>> 
>> compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify
>> String Concat strategies, because some of them are loading new methods
>> and use them during String concat linkage and execution. Notably, this
>> will happen inside of the asserts. We need to prime the asserts before
>> using them in-between counter polls.
>> 
>> Bug:
>>  https://bugs.openjdk.java.net/browse/JDK-8153265
>> 
>> Webrev:
>>  http://cr.openjdk.java.net/~shade/8153265/webrev.00/
>> 
>> Testing: offending test in oob/-Xcomp modes
> 
> Anyone?
> 
> Thanks,
> -Aleksey

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160404/327eaf03/attachment.html>

From vladimir.kozlov at oracle.com  Tue Apr  5 01:29:54 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 4 Apr 2016 18:29:54 -0700
Subject: CR for RFR 8151573
In-Reply-To: <C568518E7B433348B114B6A7122D474756E50046@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E4C64D@FMSMSX102.amr.corp.intel.com>
	<56E881A9.7070004@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C6A9@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4C6DB@FMSMSX102.amr.corp.intel.com>
	<56E89CA4.8010201@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C719@FMSMSX102.amr.corp.intel.com>
	<56E97EC5.6030608@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4C8E8@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E4F529@FMSMSX102.amr.corp.intel.com>
	<56FC5852.2030101@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4F60F@FMSMSX102.amr.corp.intel.com>
	<56FCADE3.20403@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4FBFA@FMSMSX102.amr.corp.intel.com>
	<56FF3A04.5090601@oracle.com>
	<C568518E7B433348B114B6A7122D474756E4FFCA@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E50046@FMSMSX102.amr.corp.intel.com>
Message-ID: <57031512.2060607@oracle.com>

Changes looks good.

I resubmit testing with new changes (4a).

Thanks,
Vladimir

On 4/1/16 10:16 PM, Berg, Michael C wrote:
> That small revision is reflected in:
>
> https://bugs.openjdk.java.net/browse/JDK-8151573
>
> and can be accessed at:
>
> http://cr.openjdk.java.net/~mcberg/8151573/webrev.04a/
>
> Regards,
> Michael
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
> Sent: Friday, April 01, 2016 8:25 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: RE: CR for RFR 8151573
>
> I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude.
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, April 01, 2016 8:18 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: CR for RFR 8151573
>
> I start preintegration testing.
>
> Thanks,
> Vladimir
>
> On 3/31/16 8:36 PM, Berg, Michael C wrote:
>> Vladimir, I think I have addressed every concern in the latest webrev:
>>
>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/
>>
>> I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations.  Adding more parameters didn't seem to be a win to get around it.
>> The code is fully retested with no issues.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, March 30, 2016 9:56 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>;
>> 'hotspot-compiler-dev at openjdk.java.net'
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: CR for RFR 8151573
>>
>> On 3/30/16 4:57 PM, Berg, Michael C wrote:
>>> See below for context.
>>>
>>> Thanks,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, March 30, 2016 3:51 PM
>>> To: Berg, Michael C <michael.c.berg at intel.com>;
>>> 'hotspot-compiler-dev at openjdk.java.net'
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: CR for RFR 8151573
>>>
>>> Michael,
>>>
>>> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes.
>>>
>>> multi_version_post_loops() can use is_canonical_main_loop_entry()
>>> from
>>> 8148754 but you need to modify it to move
>>> is_Main() assert to other call sites.
>>>
>>> <MCB> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different.  I can leave the name, but it will be overloaded afterward with two types of functionality.  The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different?  I will leave this one to last so that we have time to discuss this.
>>
>> I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)?
>>
>>>
>>> I did not get rce'd post loop checks in loopnode.cpp.
>>>
>>> <MCB> First I will have to explain what I am doing with do_range_check().  That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning.
>>> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit.  In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery.  Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical.  Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass.
>>> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy.  If we cannot multiversion transform the loop we added we eliminate it.
>>
>> I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled.
>>
>> You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new
>>     struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop.
>>
>> Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right?
>> There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()).
>> Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop?
>>
>>>
>>> Swap next checks since has_range_checks() may be expensive scanning loop body:
>>> +  // only process RCE'd main loops
>>> +  if (cl->has_range_checks() || !cl->is_main_loop()) return;
>>>
>>> <MCB> Ok, makes sense.
>>
>> Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request.
>> But, please, rename has_range_checks(cl) method to avoid confusion.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded.
>>> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around  do_range_check() call.
>>>
>>> <MCB> I perceive the real problem is don't scan more than once after we check. I will move towards that solution.
>>>
>>>
>>> Why you need local copies?:
>>>
>>> -  visited.Clear();
>>> -  clones.clear();
>>> +  Arena *a = Thread::current()->resource_area();
>>> +  VectorSet visited(a);
>>> +  Node_Stack clones(a, main_head->back_control()->outcnt());
>>>
>>> <MCB> I will look into this, and see if it can be cleaned up.
>>>
>>>
>>> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set.
>>>
>>> <MCB> Ok, I will look into a version without PostLoopInfo.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/30/16 1:44 PM, Berg, Michael C wrote:
>>>> Here is an update after full testing, the webrev is:
>>>>
>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/
>>>>
>>>> Please review and comment,
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev
>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
>>>> Berg, Michael C
>>>> Sent: Wednesday, March 16, 2016 10:30 AM
>>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>;
>>>> 'hotspot-compiler-dev at openjdk.java.net'
>>>> <hotspot-compiler-dev at openjdk.java.net>
>>>> Subject: RE: CR for RFR 8151573
>>>>
>>>> Putting a hold on the review, retesting everything on my end.
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Wednesday, March 16, 2016 8:42 AM
>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>> Subject: Re: CR for RFR 8151573
>>>>
>>>> On 3/15/16 5:29 PM, Berg, Michael C wrote:
>>>>> Vladimir:
>>>>>
>>>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop.
>>>>
>>>> I understand that we can get some benefits. But in general case they will not be visible.
>>>>
>>>>>
>>>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases?
>>>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping.
>>>>
>>>> Yes, after you explained me vector masking I now understand why it could be used for post loop.
>>>>
>>>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops.
>>>>
>>>> Regards,
>>>> Vladimir
>>>>
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, March 15, 2016 4:37 PM
>>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>>> Subject: Re: CR for RFR 8151573
>>>>>
>>>>> As we all know we can always construct microbenchmarks which shows
>>>>> 30%
>>>>> - 50% difference. When in real application we will never see
>>>>> difference. I still don't see a real reason why we should spend
>>>>> time and optimize
>>>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost.
>>>>>
>>>>> Why "programmable SIMD" depends on it? What about pre-loop?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/15/16 4:14 PM, Berg, Michael C wrote:
>>>>>> Correction below...
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: hotspot-compiler-dev
>>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf
>>>>>> Of Berg, Michael C
>>>>>> Sent: Tuesday, March 15, 2016 4:08 PM
>>>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net'
>>>>>> Subject: RE: CR for RFR 8151573
>>>>>>
>>>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this:
>>>>>>
>>>>>>           for(int i = 0; i < process_len; i++)
>>>>>>           {
>>>>>>             d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]);
>>>>>>           }
>>>>>>
>>>>>> The above code makes 9 vector ops.
>>>>>>
>>>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift.
>>>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways.
>>>>>> The value process_len is some fraction of the array length in my measurements.  The idea of the metrics Is to pose a post loop with a modest amount of iterations in it.  For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop.
>>>>>>
>>>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop.  We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15.  If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80.  By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations.
>>>>>>
>>>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one?
>>>>>>
>>>>>> I thought it would be easier to do them separately.  Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>> Sent: Tuesday, March 15, 2016 2:42 PM
>>>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
>>>>>> Subject: Re: CR for RFR 8151573
>>>>>>
>>>>>> Hi Michael,
>>>>>>
>>>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes?
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote:
>>>>>>> Hi Folks,
>>>>>>>
>>>>>>> I would like to contribute multi-versioning post loops for range
>>>>>>> check elimination.  Beforehand cfg optimizations after register
>>>>>>> allocation were where post loop optimizations were done for range
>>>>>>> checks.  I have added code which produces the desired effect much
>>>>>>> earlier by introducing a safe transformation which will minimally
>>>>>>> allow a range check free version of the final post loop to
>>>>>>> execute up until the point it actually has to take a range check
>>>>>>> exception by re-ranging the limit of the rce'd loop, then exit
>>>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required.
>>>>>>> If during optimization we discover that we know enough to remove
>>>>>>> the range check version of the post loop, mostly by exposing the
>>>>>>> load range values into the limit logic of the rce'd post loop, we
>>>>>>> will eliminate the range check post loop altogether much like cfg
>>>>>>> optimizations did, but much earlier.  This gives optimizations
>>>>>>> like programmable SIMD (via SuperWord) the opportunity to
>>>>>>> vectorize the rce'd post loops to a single iteration based on
>>>>>>> mask vectors which map to the residual iterations. Programmable
>>>>>>> SIMD will be a follow on change set utilizing this code to stage
>>>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations.
>>>>>>> Currently I have enabled this optimization for x86 only.  We base
>>>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added.
>>>>>>>
>>>>>>> This code was tested as follows:
>>>>>>>
>>>>>>>
>>>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573
>>>>>>>
>>>>>>>
>>>>>>> webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Michael
>>>>>>>

From vivek.r.deshpande at intel.com  Tue Apr  5 06:25:46 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Tue, 5 Apr 2016 06:25:46 +0000
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>

Hi Christian

We have updated the patch as per the suggested changes.
The webrev for the same is at this location for your review.
http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/

We will soon send another patch for CompilerDirectives changes.

Regards,
Vivek

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Tuesday, March 29, 2016 11:29 AM
To: Rukmannagari, Shravya
Cc: Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86


On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya <shravya.rukmannagari at intel.com<mailto:shravya.rukmannagari at intel.com>> wrote:

Hi Christian,
We would add separate files for each intrinsic. By splitting the CompilerDirectives, do you mean we have to add a separate file. Sorry I didn?t exactly get it.

Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for the CompilerDirectives changes and integrate them separately.


Thanks,
Shravya Rukmannagari.

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Monday, March 28, 2016 5:18 PM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com<mailto:vivek.r.deshpande at intel.com>>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov <vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>>; Rukmannagari, Shravya <shravya.rukmannagari at intel.com<mailto:shravya.rukmannagari at intel.com>>
Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86

I left this comment in the bug:

I think for the saneness of the macroAssembler_libm_x86_*.cpp files we should put every intrinsic in its own file, like we did for macroAssembler_x86_sha.cpp. They are already too big:

$ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
    4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
    3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp

Also, can we split out the CompilerDirectives changes?


On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com<mailto:vivek.r.deshpande at intel.com>> wrote:

Hi all

We would like to contribute a patch which optimizes tan and log10 X86 architecture using Intel LIBM library.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8152907
webrev:
http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/

Thanks and regards,
Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/ab29b09e/attachment.html>

From tobias.hartmann at oracle.com  Tue Apr  5 07:11:40 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 5 Apr 2016 09:11:40 +0200
Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks
Message-ID: <5703652C.6000000@oracle.com>

Hi,

please review the following patch.

https://bugs.openjdk.java.net/browse/JDK-8151724
http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/

The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag.

Tested with JPRT and RBT.

Thanks,
Tobias

From zoltan.majo at oracle.com  Tue Apr  5 07:15:43 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 5 Apr 2016 09:15:43 +0200
Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks
In-Reply-To: <5703652C.6000000@oracle.com>
References: <5703652C.6000000@oracle.com>
Message-ID: <5703661F.9040403@oracle.com>

Hi Tobias,


that looks good to me. Thank you for fixing this issue.

Best regards,


Zoltan

On 04/05/2016 09:11 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch.
>
> https://bugs.openjdk.java.net/browse/JDK-8151724
> http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/
>
> The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag.
>
> Tested with JPRT and RBT.
>
> Thanks,
> Tobias


From tobias.hartmann at oracle.com  Tue Apr  5 07:22:25 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 5 Apr 2016 09:22:25 +0200
Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks
In-Reply-To: <5703661F.9040403@oracle.com>
References: <5703652C.6000000@oracle.com> <5703661F.9040403@oracle.com>
Message-ID: <570367B1.9080600@oracle.com>

Hi Zoltan,

thanks for the review!

Best regards,
Tobias

On 05.04.2016 09:15, Zolt?n Maj? wrote:
> Hi Tobias,
> 
> 
> that looks good to me. Thank you for fixing this issue.
> 
> Best regards,
> 
> 
> Zoltan
> 
> On 04/05/2016 09:11 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8151724
>> http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/
>>
>> The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag.
>>
>> Tested with JPRT and RBT.
>>
>> Thanks,
>> Tobias
> 

From martin.doerr at sap.com  Tue Apr  5 10:10:09 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 5 Apr 2016 10:10:09 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <57020636.7010806@oracle.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
Message-ID: <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>

Hi Jamsheed,

thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :)

My webrev addresses some aspects which are not covered by your fix:

-          add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well.

-          The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields.

So I think my change is still needed.

And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like.
Would you agree?

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m
Sent: Montag, 4. April 2016 08:14
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Hi Martin,

"nmethod's exception cache not multi-thread safe"  bug is fixed in b107
bug id: https://bugs.openjdk.java.net/browse/JDK-8143897
fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9
discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html

Best Regards,
Jamsheed
On 4/1/2016 6:07 PM, Doerr, Martin wrote:
Hello everyone,

we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms.

The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking.
Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.)

I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/

Please review. I will also need a sponsor.

Best regards,
Martin


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/078ad910/attachment.html>

From nils.eliasson at oracle.com  Tue Apr  5 13:54:44 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 5 Apr 2016 15:54:44 +0200
Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails
	to compile method
In-Reply-To: <5702F14B.1020403@oracle.com>
References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com>
Message-ID: <5703C3A4.2040907@oracle.com>

Hi Vladimir,

On 2016-04-05 00:57, Vladimir Kozlov wrote:
> 2 tests have -XX:+PrintCompilation flag added. Why you need it?
>

It helps a lot to have a compilation log to start with when these hard 
to reproduce failures happen. Those two tests test the compilation parts 
of the WB API.

I ran into another issue in these tests - the compile()-method in 
CompilerWhiteBoxTest is not reliable unless the invocation counter decay 
is turned off. I added a check of the UseCounterDecay-flag in that 
method so that no one will miss it by accident.

Best regards,
Nils Eliasson


> Thanks,
> Vladimir
>
> On 4/1/16 6:55 AM, Nils Eliasson wrote:
>> Hi all,
>>
>> Please review this fix.
>>
>> Summary:
>> There is a mismatch in the CompilerWhiteBox testcases between the
>> callable and the executable constructors. SimpleTestCase$Helper
>> implements all constructors and methods that are tested. However since
>> Helper is an inner class there will be an extra (javac created)
>> constructor that has the parent class as an appended argument. The
>> callable will invoke this constructor, but the executable will reference
>> the normal constructor.
>>
>> Solution:
>> Stop have the Helper as an inner class. Rename it to
>> SimpleTestCaseHelper for some uniqueness in compiler commands and
>> directives.
>>
>> Testing:
>> Run all hotspot/compiler/whitebox tests on all platforms, and all
>> hotspot/compiler tests on one platform.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880
>> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/
>>
>> Best regards,
>> Nils Eliasson


From nils.eliasson at oracle.com  Tue Apr  5 13:56:07 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 5 Apr 2016 15:56:07 +0200
Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails
	to compile method
In-Reply-To: <5703C3A4.2040907@oracle.com>
References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com>
	<5703C3A4.2040907@oracle.com>
Message-ID: <5703C3F7.1020900@oracle.com>

I forgot the webrev link:
http://cr.openjdk.java.net/~neliasso/8151880/webrev.03/

Regards,
Nils

On 2016-04-05 15:54, Nils Eliasson wrote:
> Hi Vladimir,
>
> On 2016-04-05 00:57, Vladimir Kozlov wrote:
>> 2 tests have -XX:+PrintCompilation flag added. Why you need it?
>>
>
> It helps a lot to have a compilation log to start with when these hard 
> to reproduce failures happen. Those two tests test the compilation 
> parts of the WB API.
>
> I ran into another issue in these tests - the compile()-method in 
> CompilerWhiteBoxTest is not reliable unless the invocation counter 
> decay is turned off. I added a check of the UseCounterDecay-flag in 
> that method so that no one will miss it by accident.
>
> Best regards,
> Nils Eliasson
>
>
>> Thanks,
>> Vladimir
>>
>> On 4/1/16 6:55 AM, Nils Eliasson wrote:
>>> Hi all,
>>>
>>> Please review this fix.
>>>
>>> Summary:
>>> There is a mismatch in the CompilerWhiteBox testcases between the
>>> callable and the executable constructors. SimpleTestCase$Helper
>>> implements all constructors and methods that are tested. However since
>>> Helper is an inner class there will be an extra (javac created)
>>> constructor that has the parent class as an appended argument. The
>>> callable will invoke this constructor, but the executable will 
>>> reference
>>> the normal constructor.
>>>
>>> Solution:
>>> Stop have the Helper as an inner class. Rename it to
>>> SimpleTestCaseHelper for some uniqueness in compiler commands and
>>> directives.
>>>
>>> Testing:
>>> Run all hotspot/compiler/whitebox tests on all platforms, and all
>>> hotspot/compiler tests on one platform.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/
>>>
>>> Best regards,
>>> Nils Eliasson
>


From doug.simon at oracle.com  Tue Apr  5 14:16:19 2016
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 5 Apr 2016 16:16:19 +0200
Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an
	nmethod
In-Reply-To: <F2652198-15CC-4C2E-9E92-28594F5D4F0E@oracle.com>
References: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
	<8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com>
	<3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com>
	<F2652198-15CC-4C2E-9E92-28594F5D4F0E@oracle.com>
Message-ID: <C912108E-3B91-4C44-A8CA-A04F3FB41387@oracle.com>


> On 05 Apr 2016, at 02:00, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
> 
>> On Apr 4, 2016, at 12:34 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>> 
>> No, not good.  We are failing a couple JVMCI tests.  Looking into it?
> 
> Ok, this got a little out of control but for the better:
> 
> http://cr.openjdk.java.net/~twisti/8153439/webrev.01/
> 
> The actual fix is to check for a null log argument.  The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE.  This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions.
> 
> While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone.  Again, stupid jtreg.
> 
> Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog.  I think it should.

I agree. Can you modify your derivative webrev for that? Of course, we wouldn?t need the cast to HotSpotSpeculationLog in HotSpotCodeCacheProvider once you?ve made that change.

-Doug

From vladimir.x.ivanov at oracle.com  Tue Apr  5 15:12:19 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 5 Apr 2016 18:12:19 +0300
Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/
	incremental inlining
Message-ID: <5703D5D3.7020801@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00
https://bugs.openjdk.java.net/browse/JDK-8152590

Constant folding of stable field loads only happens during parsing. 
During incremental (post-parse) inlining some loads can become foldable, 
but they aren't optimized.

Though the fix is pretty trivial (webrev.00.02), I decided to refactor 
relevant code and get rid of redundant parts.

To ease the review I split the change into 4 parts:

(1) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.00

   * extracted all constant-related checks into 
ciField::constant_value()/ciField::constant_value_of();

   * common constant folding logic into 
GraphKit::make_constant_from_field()/Type::make_constant_*():

     Parse::do_get_xxx() / LibraryCallKit::inline_unsafe_access()
       GraphKit::make_constant_from_field(ciField*, Node*)
         Type::make_constant_from_field(ciField*, ...)
           Type::make_from_constant(ciConstant, ...)

   * fold_stable_ary_elem is moved to 
Type::make_constant_from_array_element()

   * check_mismatched_access is moved to type.cpp


(2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01

Refactored constant folding logic for static final fields and unified 
folding logic with instance fields: is_constant() depends only on the 
flags and caller should check return value from 
ciField::constant_value() for validity (ciConstant.is_valid())


(3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02

Do constant folding for fields (both static and instance) in 
LoadNode::Value.


(4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03

Mark CallSite::target field as constant.


Also:

   * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java

Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane.

Thanks!

Best regards,
Vladimir Ivanov

From vladimir.kozlov at oracle.com  Tue Apr  5 15:20:08 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 5 Apr 2016 08:20:08 -0700
Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks
In-Reply-To: <5703652C.6000000@oracle.com>
References: <5703652C.6000000@oracle.com>
Message-ID: <5703D7A8.3090602@oracle.com>

Looks good. Tobias, thank you for investigating asserts crashes.

Thanks,
Vladimir

On 4/5/16 12:11 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch.
>
> https://bugs.openjdk.java.net/browse/JDK-8151724
> http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/
>
> The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag.
>
> Tested with JPRT and RBT.
>
> Thanks,
> Tobias
>

From tom.rodriguez at oracle.com  Tue Apr  5 15:21:53 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 5 Apr 2016 08:21:53 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
Message-ID: <F4124C58-66AA-47AE-9278-7EBBC78549C6@oracle.com>

looks good to me.

tom

> On Apr 1, 2016, at 11:28 AM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
> 
> Thanks,
> igor


From tom.rodriguez at oracle.com  Tue Apr  5 15:23:49 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 5 Apr 2016 08:23:49 -0700
Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should
	return dependencies_failed
In-Reply-To: <110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com>
References: <EFA1C857-34E1-4131-AF1A-FEF515A7D6FB@oracle.com>
	<110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com>
Message-ID: <A2A217BF-2795-48B9-9F14-DDE3ECACA587@oracle.com>

Thanks!

tom

> On Apr 1, 2016, at 1:18 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> Looks good.
> 
> igor
> 
>> On Apr 1, 2016, at 12:47 PM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>> 
>> http://cr.openjdk.java.net/~never/8153315/webrev
>> 
>> This fixes a minor issue which showed up while debugging Java code.  evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies.  Graal considers it an error to build invalid dependencies so it complained.  Tested under the Eclipse debugger.
>> 
>> tom
> 


From tobias.hartmann at oracle.com  Tue Apr  5 15:28:57 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 5 Apr 2016 17:28:57 +0200
Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks
In-Reply-To: <5703D7A8.3090602@oracle.com>
References: <5703652C.6000000@oracle.com> <5703D7A8.3090602@oracle.com>
Message-ID: <5703D9B9.6010206@oracle.com>

Thanks, Vladimir!

Best regards,
Tobias

On 05.04.2016 17:20, Vladimir Kozlov wrote:
> Looks good. Tobias, thank you for investigating asserts crashes.
> 
> Thanks,
> Vladimir
> 
> On 4/5/16 12:11 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8151724
>> http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/
>>
>> The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag.
>>
>> Tested with JPRT and RBT.
>>
>> Thanks,
>> Tobias
>>

From vladimir.kozlov at oracle.com  Tue Apr  5 15:32:57 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 5 Apr 2016 08:32:57 -0700
Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails
	to compile method
In-Reply-To: <5703C3F7.1020900@oracle.com>
References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com>
	<5703C3A4.2040907@oracle.com> <5703C3F7.1020900@oracle.com>
Message-ID: <5703DAA9.7010707@oracle.com>

Yes, adding UseCounterDecay make sense in this case.
And I agree with PrintCompilation since it help diagnose problems.
Reviewed.

Thanks,
Vladimir

On 4/5/16 6:56 AM, Nils Eliasson wrote:
> I forgot the webrev link:
> http://cr.openjdk.java.net/~neliasso/8151880/webrev.03/
>
> Regards,
> Nils
>
> On 2016-04-05 15:54, Nils Eliasson wrote:
>> Hi Vladimir,
>>
>> On 2016-04-05 00:57, Vladimir Kozlov wrote:
>>> 2 tests have -XX:+PrintCompilation flag added. Why you need it?
>>>
>>
>> It helps a lot to have a compilation log to start with when these hard
>> to reproduce failures happen. Those two tests test the compilation
>> parts of the WB API.
>>
>> I ran into another issue in these tests - the compile()-method in
>> CompilerWhiteBoxTest is not reliable unless the invocation counter
>> decay is turned off. I added a check of the UseCounterDecay-flag in
>> that method so that no one will miss it by accident.
>>
>> Best regards,
>> Nils Eliasson
>>
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/1/16 6:55 AM, Nils Eliasson wrote:
>>>> Hi all,
>>>>
>>>> Please review this fix.
>>>>
>>>> Summary:
>>>> There is a mismatch in the CompilerWhiteBox testcases between the
>>>> callable and the executable constructors. SimpleTestCase$Helper
>>>> implements all constructors and methods that are tested. However since
>>>> Helper is an inner class there will be an extra (javac created)
>>>> constructor that has the parent class as an appended argument. The
>>>> callable will invoke this constructor, but the executable will
>>>> reference
>>>> the normal constructor.
>>>>
>>>> Solution:
>>>> Stop have the Helper as an inner class. Rename it to
>>>> SimpleTestCaseHelper for some uniqueness in compiler commands and
>>>> directives.
>>>>
>>>> Testing:
>>>> Run all hotspot/compiler/whitebox tests on all platforms, and all
>>>> hotspot/compiler tests on one platform.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880
>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/
>>>>
>>>> Best regards,
>>>> Nils Eliasson
>>
>

From igor.veresov at oracle.com  Tue Apr  5 15:57:41 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Apr 2016 08:57:41 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <F4124C58-66AA-47AE-9278-7EBBC78549C6@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<F4124C58-66AA-47AE-9278-7EBBC78549C6@oracle.com>
Message-ID: <BB808825-AF33-4D67-AA75-A3EEED1BF60F@oracle.com>

Thanks, Tom!

igor

> On Apr 5, 2016, at 8:21 AM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> looks good to me.
> 
> tom
> 
>> On Apr 1, 2016, at 11:28 AM, Igor Veresov <igor.veresov at oracle.com> wrote:
>> 
>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>> 
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>> 
>> Thanks,
>> igor
> 


From lois.foltan at oracle.com  Tue Apr  5 16:30:27 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Tue, 05 Apr 2016 12:30:27 -0400
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
Message-ID: <5703E823.8050400@oracle.com>

Hi Igor,

I know you have two reviews for this but could you hold off committing 
until I or Karen Kinnear have a chance to review.  We both worked in 
this area a lot to support default methods in JDK 8. Also, have you run 
the hotspot/test/runtime/SelectionResolution tests on this?

Thanks,
Lois

On 4/1/2016 2:28 PM, Igor Veresov wrote:
> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>
> Thanks,
> igor


From igor.veresov at oracle.com  Tue Apr  5 16:50:49 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Apr 2016 09:50:49 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <5703E823.8050400@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
Message-ID: <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>

Hi Lois,

Thanks for looking at it. Yes, it passes all hotspot jtreg tests.

igor

> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
> 
> Hi Igor,
> 
> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
> 
> Thanks,
> Lois
> 
> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>> 
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>> 
>> Thanks,
>> igor
> 


From lois.foltan at oracle.com  Tue Apr  5 17:34:15 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Tue, 05 Apr 2016 13:34:15 -0400
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
Message-ID: <5703F717.702@oracle.com>


On 4/5/2016 12:50 PM, Igor Veresov wrote:
> Hi Lois,
>
> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>
> igor
Hi Igor,

Thanks for waiting on this.  A couple of comments:

- Section 6.5 Instructions for invokeinterface does indicate that a 
"Linking Exceptions" the VM can throw an ICCE if the resolved method is 
static or private.  So I think moving this exception from runtime to 
linktime is okay.

- I'm concerned about the change on line #998, #1030, #1316.  I don't 
think you are necessarily guaranteed to have the bytecodes that you are 
now passing to resolve_interface_method.  For example, line #998 within 
linktime_resolve_static_method, you may not have an invokestatic here, 
you may have another invoke* bytecode trying to invoke a static 
interface method.  Passing in Bytecodes::_invokestatic seems wrong, 
because even if the resolved method is static, "nostatics" was set to false.

Just curious did you also run the testbase default methods tests?
Lois

>
>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
>>
>> Hi Igor,
>>
>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>
>> Thanks,
>> Lois
>>
>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>>>
>>> Thanks,
>>> igor


From igor.veresov at oracle.com  Tue Apr  5 17:44:56 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Apr 2016 10:44:56 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <5703F717.702@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
Message-ID: <AD8F4A4B-AC14-46C5-BBAF-6432FEECA5F4@oracle.com>


> On Apr 5, 2016, at 10:34 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
> 
> 
> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>> Hi Lois,
>> 
>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>> 
>> igor
> Hi Igor,
> 
> Thanks for waiting on this.  A couple of comments:
> 
> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
> 
> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
> 

I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction?

> Just curious did you also run the testbase default methods tests?

Yes, within the context of a closed project. That?s actually what made these changes necessary.

igor

> Lois
> 
>> 
>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
>>> 
>>> Hi Igor,
>>> 
>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>> 
>>> Thanks,
>>> Lois
>>> 
>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>> 
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>>>> 
>>>> Thanks,
>>>> igor
> 


From tom.rodriguez at oracle.com  Tue Apr  5 17:55:04 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 5 Apr 2016 10:55:04 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <5703F717.702@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
Message-ID: <29A506A0-BEC9-4514-B8E9-92E6C29B1E40@oracle.com>

> Thanks for waiting on this.  A couple of comments:
> 
> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
> 
> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.

Unless i?m misunderstanding the code, I?d say that it was a bug that nostatics was being passed as false.  The standard naming in LinkResolver follows a pattern where the name is associated with the byte code being used for the invoke.  So if you are in {linktime,runtime}_resolve_{foo}_method then an invokefoo byte code is what?s being used.  resolve_interface_method is currently not following that naming scheme, which I think should be fixed.  Maybe resolve_method_in_interface?   I do agree we should be sure that the paths leading to this call agree about the actual byte code being used, but I can?t see a path where a different byte code could have been passed in.  

tom

> 
> Just curious did you also run the testbase default methods tests?
> Lois
> 
>> 
>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
>>> 
>>> Hi Igor,
>>> 
>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>> 
>>> Thanks,
>>> Lois
>>> 
>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>> 
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>>>> 
>>>> Thanks,
>>>> igor
> 


From igor.veresov at oracle.com  Tue Apr  5 17:56:55 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Apr 2016 10:56:55 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <AD8F4A4B-AC14-46C5-BBAF-6432FEECA5F4@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<AD8F4A4B-AC14-46C5-BBAF-6432FEECA5F4@oracle.com>
Message-ID: <C43AC2AB-C557-4E25-B7EC-EAAA24B1C38D@oracle.com>


> On Apr 5, 2016, at 10:44 AM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
>> 
>> On Apr 5, 2016, at 10:34 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
>> 
>> 
>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>> Hi Lois,
>>> 
>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>> 
>>> igor
>> Hi Igor,
>> 
>> Thanks for waiting on this.  A couple of comments:
>> 
>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>> 
>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>> 
> 
> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction?

Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be.

The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code.

igor 

> 
>> Just curious did you also run the testbase default methods tests?
> 
> Yes, within the context of a closed project. That?s actually what made these changes necessary.
> 
> igor
> 
>> Lois
>> 
>>> 
>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
>>>> 
>>>> Hi Igor,
>>>> 
>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>> 
>>>> Thanks,
>>>> Lois
>>>> 
>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>> 
>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>>>>> 
>>>>> Thanks,
>>>>> igor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/98d68ed5/attachment-0001.html>

From karen.kinnear at oracle.com  Tue Apr  5 18:12:41 2016
From: karen.kinnear at oracle.com (Karen Kinnear)
Date: Tue, 5 Apr 2016 14:12:41 -0400
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <BB808825-AF33-4D67-AA75-A3EEED1BF60F@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<F4124C58-66AA-47AE-9278-7EBBC78549C6@oracle.com>
	<BB808825-AF33-4D67-AA75-A3EEED1BF60F@oracle.com>
Message-ID: <01AE161D-FAE2-4F37-9E79-3951DB516EE0@oracle.com>

Igor,

I?d like to get back to you on this before you check in the change please. I need to sanity check the JVMS and the 
code.

I have another set of tests written by Vladimir Ivanov that I will send you as well.

thanks,
Karen

> On Apr 5, 2016, at 11:57 AM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> Thanks, Tom!
> 
> igor
> 
>> On Apr 5, 2016, at 8:21 AM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>> 
>> looks good to me.
>> 
>> tom
>> 
>>> On Apr 1, 2016, at 11:28 AM, Igor Veresov <igor.veresov at oracle.com> wrote:
>>> 
>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>> 
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>>> 
>>> Thanks,
>>> igor
>> 
> 


From karen.kinnear at oracle.com  Tue Apr  5 19:04:12 2016
From: karen.kinnear at oracle.com (Karen Kinnear)
Date: Tue, 5 Apr 2016 15:04:12 -0400
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <5703F717.702@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
Message-ID: <E1FCFC2E-0D1F-4191-9F8B-DA1BF8817F4A@oracle.com>

I am in agreement with Lois that the JVMS looks good with moving the exception.

With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next
meeting I will check one more time. It might be worth adding a comment.

My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks
if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null.

I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the
corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code,
so that you get the correct behavior depending on the requesting byte code.

I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so
I could use help studying this a bit more to understand if this really is resolution or is really selection.

thanks,
Karen
 
> On Apr 5, 2016, at 1:34 PM, Lois Foltan <lois.foltan at oracle.com> wrote:
> 
> 
> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>> Hi Lois,
>> 
>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>> 
>> igor
> Hi Igor,
> 
> Thanks for waiting on this.  A couple of comments:
> 
> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
> 
> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
> 
> Just curious did you also run the testbase default methods tests?
> Lois
> 
>> 
>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
>>> 
>>> Hi Igor,
>>> 
>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>> 
>>> Thanks,
>>> Lois
>>> 
>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>> 
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>>>> 
>>>> Thanks,
>>>> igor
> 


From vladimir.kozlov at oracle.com  Tue Apr  5 19:23:44 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 5 Apr 2016 12:23:44 -0700
Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/
	incremental inlining
In-Reply-To: <5703D5D3.7020801@oracle.com>
References: <5703D5D3.7020801@oracle.com>
Message-ID: <570410C0.7080105@oracle.com>


On 4/5/16 8:12 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00
> https://bugs.openjdk.java.net/browse/JDK-8152590
>
> Constant folding of stable field loads only happens during parsing.
> During incremental (post-parse) inlining some loads can become foldable,
> but they aren't optimized.
>
> Though the fix is pretty trivial (webrev.00.02), I decided to refactor
> relevant code and get rid of redundant parts.
>
> To ease the review I split the change into 4 parts:
>
> (1) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.00
>
>    * extracted all constant-related checks into
> ciField::constant_value()/ciField::constant_value_of();
>
>    * common constant folding logic into
> GraphKit::make_constant_from_field()/Type::make_constant_*():
>
>      Parse::do_get_xxx() / LibraryCallKit::inline_unsafe_access()
>        GraphKit::make_constant_from_field(ciField*, Node*)
>          Type::make_constant_from_field(ciField*, ...)
>            Type::make_from_constant(ciConstant, ...)
>
>    * fold_stable_ary_elem is moved to
> Type::make_constant_from_array_element()
>
>    * check_mismatched_access is moved to type.cpp

Type::make_constant_from_field() - is_stable_array and stable_dimension 
are needed only at the end for make_from_constant() call, move them there.

check_mismatched_access() result is used only in assert. Should you put 
the call and assert under #ifdef ASSERT?

>
>
> (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01
>
> Refactored constant folding logic for static final fields and unified
> folding logic with instance fields: is_constant() depends only on the
> flags and caller should check return value from
> ciField::constant_value() for validity (ciConstant.is_valid())
>

Good.

>
> (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02
>
> Do constant folding for fields (both static and instance) in
> LoadNode::Value.

Good.

>
>
> (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03
>
> Mark CallSite::target field as constant.

Okay.

Thanks,
Vladimir

>
>
> Also:
>
>    * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java
>
> Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane.
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov

From igor.veresov at oracle.com  Tue Apr  5 19:43:51 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Apr 2016 12:43:51 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <E1FCFC2E-0D1F-4191-9F8B-DA1BF8817F4A@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<E1FCFC2E-0D1F-4191-9F8B-DA1BF8817F4A@oracle.com>
Message-ID: <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com>


> On Apr 5, 2016, at 12:04 PM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
> 
> I am in agreement with Lois that the JVMS looks good with moving the exception.

Thanks!
> 
> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next
> meeting I will check one more time. It might be worth adding a comment.

Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/
Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle.

> 
> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks
> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null.
> 

That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 <https://bugs.openjdk.java.net/browse/JDK-8152903>
In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp).

igor

> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the
> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code,
> so that you get the correct behavior depending on the requesting byte code.
> 
> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so
> I could use help studying this a bit more to understand if this really is resolution or is really selection.
> 
> thanks,
> Karen
> 
>> On Apr 5, 2016, at 1:34 PM, Lois Foltan <lois.foltan at oracle.com> wrote:
>> 
>> 
>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>> Hi Lois,
>>> 
>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>> 
>>> igor
>> Hi Igor,
>> 
>> Thanks for waiting on this.  A couple of comments:
>> 
>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>> 
>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>> 
>> Just curious did you also run the testbase default methods tests?
>> Lois
>> 
>>> 
>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
>>>> 
>>>> Hi Igor,
>>>> 
>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>> 
>>>> Thanks,
>>>> Lois
>>>> 
>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>> 
>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/
>>>>> 
>>>>> Thanks,
>>>>> igor
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/e6fa7453/attachment.html>

From vladimir.x.ivanov at oracle.com  Tue Apr  5 19:47:26 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 5 Apr 2016 22:47:26 +0300
Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/
	incremental inlining
In-Reply-To: <570410C0.7080105@oracle.com>
References: <5703D5D3.7020801@oracle.com> <570410C0.7080105@oracle.com>
Message-ID: <5704164E.3040006@oracle.com>

Thanks for review, Vladimir!

Updated version:
   http://cr.openjdk.java.net/~vlivanov/8152590/webrev.01
>>    * check_mismatched_access is moved to type.cpp
>
> Type::make_constant_from_field() - is_stable_array and stable_dimension
> are needed only at the end for make_from_constant() call, move them there.

Done.

> check_mismatched_access() result is used only in assert. Should you put
> the call and assert under #ifdef ASSERT?
No, it's a bug in the change: con should be used instead of field_value. 
check_mismatched_access filters out invalid accesses and adjusts the 
value for unsigned loads.

Fixed.

Best regards,
Vladimir Ivanov

PS: I'll enhance the tests to catch unsigned field load case.

>
>>
>>
>> (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01
>>
>> Refactored constant folding logic for static final fields and unified
>> folding logic with instance fields: is_constant() depends only on the
>> flags and caller should check return value from
>> ciField::constant_value() for validity (ciConstant.is_valid())
>>
>
> Good.
>
>>
>> (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02
>>
>> Do constant folding for fields (both static and instance) in
>> LoadNode::Value.
>
> Good.
>
>>
>>
>> (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03
>>
>> Mark CallSite::target field as constant.
>
> Okay.
>
> Thanks,
> Vladimir
>
>>
>>
>> Also:
>>
>>    * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java
>>
>> Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane.
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov

From lois.foltan at oracle.com  Tue Apr  5 19:54:14 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Tue, 05 Apr 2016 15:54:14 -0400
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <C43AC2AB-C557-4E25-B7EC-EAAA24B1C38D@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<AD8F4A4B-AC14-46C5-BBAF-6432FEECA5F4@oracle.com>
	<C43AC2AB-C557-4E25-B7EC-EAAA24B1C38D@oracle.com>
Message-ID: <570417E6.60107@oracle.com>


On 4/5/2016 1:56 PM, Igor Veresov wrote:
>
>> On Apr 5, 2016, at 10:44 AM, Igor Veresov <igor.veresov at oracle.com 
>> <mailto:igor.veresov at oracle.com>> wrote:
>>
>>>
>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan <lois.foltan at oracle.com 
>>> <mailto:lois.foltan at oracle.com>> wrote:
>>>
>>>
>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>> Hi Lois,
>>>>
>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>>
>>>> igor
>>> Hi Igor,
>>>
>>> Thanks for waiting on this.  A couple of comments:
>>>
>>> - Section 6.5 Instructions for invokeinterface does indicate that a 
>>> "Linking Exceptions" the VM can throw an ICCE if the resolved method 
>>> is static or private.  So I think moving this exception from runtime 
>>> to linktime is okay.
>>>
>>> - I'm concerned about the change on line #998, #1030, #1316.  I 
>>> don't think you are necessarily guaranteed to have the bytecodes 
>>> that you are now passing to resolve_interface_method.  For example, 
>>> line #998 within linktime_resolve_static_method, you may not have an 
>>> invokestatic here, you may have another invoke* bytecode trying to 
>>> invoke a static interface method.  Passing in 
>>> Bytecodes::_invokestatic seems wrong, because even if the resolved 
>>> method is static, "nostatics" was set to false.
>>>
>>
>> I looked at the call graphs of these guys and 
>> linktime_resolve_X_method() methods actually seem to only called 
>> within invokeX contexts. But may be I missed something. Can you give 
>> an example of a path that may cause, say, 
>> linktime_resolve_static_method() be invoked for non-invokestatic 
>> instruction?
>
> Actually, the easier way to think about it, would be: The answer 
> returned by resolve_interface_method() is result of a method 
> resolution in the interface class ?as if? it were invoked by the given 
> bytecode instruction. It of course doesn?t not mean that the 
> invocation is really a result of the said instruction. The context is. 
> As you may see the logic around ?nostatic? did not change and the 
> logic around resolve_interface_method() being called within the 
> invokeinterface context is what we want it to be.
>
> The same effect could have been achieved by adding another bool 
> argument to resolve_interface_method() to indicate a question within 
> the invokeinterface context. But passing a bytecode makes it an easier 
> to read code.

Okay, I saw Tom's reply as well to my comments and I reviewed all the 
call paths.  I think the change is okay.  I kind of wish that the 
original bytecode was stored in the LinkInfo structure so that we could 
just reference the actual bytecode used and make decisions based on that 
instead of the parameter approach, but that maybe a RFE to investigate 
later.
Lois

>
> igor
>
>>
>>> Just curious did you also run the testbase default methods tests?
>>
>> Yes, within the context of a closed project. That?s actually what 
>> made these changes necessary.
>>
>> igor
>>
>>> Lois
>>>
>>>>
>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com 
>>>>> <mailto:lois.foltan at oracle.com>> wrote:
>>>>>
>>>>> Hi Igor,
>>>>>
>>>>> I know you have two reviews for this but could you hold off 
>>>>> committing until I or Karen Kinnear have a chance to review. We 
>>>>> both worked in this area a lot to support default methods in JDK 
>>>>> 8. Also, have you run the hotspot/test/runtime/SelectionResolution 
>>>>> tests on this?
>>>>>
>>>>> Thanks,
>>>>> Lois
>>>>>
>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>> When invoking private interface methods with invokeinterface we 
>>>>>> throw ICCE. The check for that happens in the runtime part of the 
>>>>>> resolution, however, doing it at linktime seems like a better 
>>>>>> place, since the check doesn't depend on the receiver type. It 
>>>>>> also allows compiler interfaces that rely on linktime resolution 
>>>>>> to detect inconsistencies during parsing (see 
>>>>>> ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() 
>>>>>> (JVMCI) that are affected).
>>>>>>
>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ 
>>>>>> <http://cr.openjdk.java.net/%7Eiveresov/8153115/webrev.00/>
>>>>>>
>>>>>> Thanks,
>>>>>> igor
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/eb03ce99/attachment.html>

From vladimir.kozlov at oracle.com  Tue Apr  5 20:07:11 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 5 Apr 2016 13:07:11 -0700
Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/
	incremental inlining
In-Reply-To: <5704164E.3040006@oracle.com>
References: <5703D5D3.7020801@oracle.com> <570410C0.7080105@oracle.com>
	<5704164E.3040006@oracle.com>
Message-ID: <57041AEF.4090907@oracle.com>

This looks good.

Thanks,
Vladimir K

On 4/5/16 12:47 PM, Vladimir Ivanov wrote:
> Thanks for review, Vladimir!
>
> Updated version:
>    http://cr.openjdk.java.net/~vlivanov/8152590/webrev.01
>>>    * check_mismatched_access is moved to type.cpp
>>
>> Type::make_constant_from_field() - is_stable_array and stable_dimension
>> are needed only at the end for make_from_constant() call, move them
>> there.
>
> Done.
>
>> check_mismatched_access() result is used only in assert. Should you put
>> the call and assert under #ifdef ASSERT?
> No, it's a bug in the change: con should be used instead of field_value.
> check_mismatched_access filters out invalid accesses and adjusts the
> value for unsigned loads.
>
> Fixed.
>
> Best regards,
> Vladimir Ivanov
>
> PS: I'll enhance the tests to catch unsigned field load case.
>
>>
>>>
>>>
>>> (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01
>>>
>>> Refactored constant folding logic for static final fields and unified
>>> folding logic with instance fields: is_constant() depends only on the
>>> flags and caller should check return value from
>>> ciField::constant_value() for validity (ciConstant.is_valid())
>>>
>>
>> Good.
>>
>>>
>>> (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02
>>>
>>> Do constant folding for fields (both static and instance) in
>>> LoadNode::Value.
>>
>> Good.
>>
>>>
>>>
>>> (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03
>>>
>>> Mark CallSite::target field as constant.
>>
>> Okay.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>>
>>> Also:
>>>
>>>    * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java
>>>
>>> Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane.
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Tue Apr  5 20:33:34 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 5 Apr 2016 13:33:34 -0700
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
Message-ID: <5704211E.5090007@oracle.com>

It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files 
changes in webrev. Did you used 'hg remove' or simple removed them?

I will start pre-integration testing.

Thanks,
Vladimir

On 4/4/16 11:25 PM, Deshpande, Vivek R wrote:
> Hi Christian
>
> We have updated the patch as per the suggested changes.
>
> The webrev for the same is at this location for your review.
>
> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/
>
> We will soon send another patch for CompilerDirectives changes.
>
> Regards,
>
> Vivek
>
> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
> *Sent:* Tuesday, March 29, 2016 11:29 AM
> *To:* Rukmannagari, Shravya
> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86
>
>     On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>     <shravya.rukmannagari at intel.com
>     <mailto:shravya.rukmannagari at intel.com>> wrote:
>
>     Hi Christian,
>
>     We would add separate files for each intrinsic. By splitting the
>     CompilerDirectives, do you mean we have to add a separate file.
>     Sorry I didn?t exactly get it.
>
> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for
> the CompilerDirectives changes and integrate them separately.
>
>
>
> Thanks,
>
> Shravya Rukmannagari.
>
> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
> *Sent:*Monday, March 28, 2016 5:18 PM
> *To:*Deshpande, Vivek R <vivek.r.deshpande at intel.com
> <mailto:vivek.r.deshpande at intel.com>>
> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
> Rukmannagari, Shravya <shravya.rukmannagari at intel.com
> <mailto:shravya.rukmannagari at intel.com>>
> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>
> I left this comment in the bug:
>
> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we
> should put every intrinsic in its own file, like we did for
> macroAssembler_x86_sha.cpp. They are already too big:
>
> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>      4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>      3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>
> Also, can we split out the CompilerDirectives changes?
>
>
>
>
>     On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>     <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>     wrote:
>
>     Hi all
>
>     We would like to contribute a patch which optimizestan and log10
>     X86architecture usingIntel LIBM library.
>
>     Could you please review and sponsor this patch.
>
>     Bug-id:
>
>     https://bugs.openjdk.java.net/browse/JDK-8152907
>     webrev:
>
>     http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/
>
>     Thanks and regards,
>
>     Vivek
>

From vivek.r.deshpande at intel.com  Tue Apr  5 20:41:48 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Tue, 5 Apr 2016 20:41:48 +0000
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <5704211E.5090007@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<5704211E.5090007@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com>


Hi Vladimir

I will send you the patch with macroAssembler_libm_x86_*.cpp files removed.
Thank you for the review.

Regards,
Vivek
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Tuesday, April 05, 2016 1:34 PM
To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya
Cc: hotspot compiler
Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86

It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them?

I will start pre-integration testing.

Thanks,
Vladimir

On 4/4/16 11:25 PM, Deshpande, Vivek R wrote:
> Hi Christian
>
> We have updated the patch as per the suggested changes.
>
> The webrev for the same is at this location for your review.
>
> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01
> /
>
> We will soon send another patch for CompilerDirectives changes.
>
> Regards,
>
> Vivek
>
> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
> *Sent:* Tuesday, March 29, 2016 11:29 AM
> *To:* Rukmannagari, Shravya
> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86
>
>     On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>     <shravya.rukmannagari at intel.com
>     <mailto:shravya.rukmannagari at intel.com>> wrote:
>
>     Hi Christian,
>
>     We would add separate files for each intrinsic. By splitting the
>     CompilerDirectives, do you mean we have to add a separate file.
>     Sorry I didn?t exactly get it.
>
> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for 
> the CompilerDirectives changes and integrate them separately.
>
>
>
> Thanks,
>
> Shravya Rukmannagari.
>
> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
> *Sent:*Monday, March 28, 2016 5:18 PM
> *To:*Deshpande, Vivek R <vivek.r.deshpande at intel.com 
> <mailto:vivek.r.deshpande at intel.com>>
> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov 
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
> Rukmannagari, Shravya <shravya.rukmannagari at intel.com 
> <mailto:shravya.rukmannagari at intel.com>>
> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>
> I left this comment in the bug:
>
> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we 
> should put every intrinsic in its own file, like we did for 
> macroAssembler_x86_sha.cpp. They are already too big:
>
> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>      4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>      3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>
> Also, can we split out the CompilerDirectives changes?
>
>
>
>
>     On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>     <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>     wrote:
>
>     Hi all
>
>     We would like to contribute a patch which optimizestan and log10
>     X86architecture usingIntel LIBM library.
>
>     Could you please review and sponsor this patch.
>
>     Bug-id:
>
>     https://bugs.openjdk.java.net/browse/JDK-8152907
>     webrev:
>
>     
> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00
> /
>
>     Thanks and regards,
>
>     Vivek
>

From vladimir.kozlov at oracle.com  Tue Apr  5 20:47:28 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 5 Apr 2016 13:47:28 -0700
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<5704211E.5090007@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com>
Message-ID: <57042460.5070306@oracle.com>

I again can't apply changes because of CR at the end of lines in patch file.

Vladimir

On 4/5/16 1:41 PM, Deshpande, Vivek R wrote:
>
> Hi Vladimir
>
> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed.
> Thank you for the review.
>
> Regards,
> Vivek
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, April 05, 2016 1:34 PM
> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya
> Cc: hotspot compiler
> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>
> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them?
>
> I will start pre-integration testing.
>
> Thanks,
> Vladimir
>
> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote:
>> Hi Christian
>>
>> We have updated the patch as per the suggested changes.
>>
>> The webrev for the same is at this location for your review.
>>
>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01
>> /
>>
>> We will soon send another patch for CompilerDirectives changes.
>>
>> Regards,
>>
>> Vivek
>>
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:* Tuesday, March 29, 2016 11:29 AM
>> *To:* Rukmannagari, Shravya
>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>>      On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>>      <shravya.rukmannagari at intel.com
>>      <mailto:shravya.rukmannagari at intel.com>> wrote:
>>
>>      Hi Christian,
>>
>>      We would add separate files for each intrinsic. By splitting the
>>      CompilerDirectives, do you mean we have to add a separate file.
>>      Sorry I didn?t exactly get it.
>>
>> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for
>> the CompilerDirectives changes and integrate them separately.
>>
>>
>>
>> Thanks,
>>
>> Shravya Rukmannagari.
>>
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:*Monday, March 28, 2016 5:18 PM
>> *To:*Deshpande, Vivek R <vivek.r.deshpande at intel.com
>> <mailto:vivek.r.deshpande at intel.com>>
>> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
>> Rukmannagari, Shravya <shravya.rukmannagari at intel.com
>> <mailto:shravya.rukmannagari at intel.com>>
>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>> I left this comment in the bug:
>>
>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we
>> should put every intrinsic in its own file, like we did for
>> macroAssembler_x86_sha.cpp. They are already too big:
>>
>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>>       4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>>       3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>>
>> Also, can we split out the CompilerDirectives changes?
>>
>>
>>
>>
>>      On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>>      <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>      wrote:
>>
>>      Hi all
>>
>>      We would like to contribute a patch which optimizestan and log10
>>      X86architecture usingIntel LIBM library.
>>
>>      Could you please review and sponsor this patch.
>>
>>      Bug-id:
>>
>>      https://bugs.openjdk.java.net/browse/JDK-8152907
>>      webrev:
>>
>>
>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00
>> /
>>
>>      Thanks and regards,
>>
>>      Vivek
>>

From vivek.r.deshpande at intel.com  Tue Apr  5 21:27:00 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Tue, 5 Apr 2016 21:27:00 +0000
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <57042460.5070306@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<5704211E.5090007@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com>
	<57042460.5070306@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com>

HI Vladimir

Sorry about that.
Please check this webrev
http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02/
I have updated it.

Regards,
Vivek

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Tuesday, April 05, 2016 1:47 PM
To: Deshpande, Vivek R; Rukmannagari, Shravya
Cc: hotspot compiler
Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86

I again can't apply changes because of CR at the end of lines in patch file.

Vladimir

On 4/5/16 1:41 PM, Deshpande, Vivek R wrote:
>
> Hi Vladimir
>
> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed.
> Thank you for the review.
>
> Regards,
> Vivek
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, April 05, 2016 1:34 PM
> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya
> Cc: hotspot compiler
> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>
> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them?
>
> I will start pre-integration testing.
>
> Thanks,
> Vladimir
>
> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote:
>> Hi Christian
>>
>> We have updated the patch as per the suggested changes.
>>
>> The webrev for the same is at this location for your review.
>>
>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0
>> 1
>> /
>>
>> We will soon send another patch for CompilerDirectives changes.
>>
>> Regards,
>>
>> Vivek
>>
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:* Tuesday, March 29, 2016 11:29 AM
>> *To:* Rukmannagari, Shravya
>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>>      On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>>      <shravya.rukmannagari at intel.com
>>      <mailto:shravya.rukmannagari at intel.com>> wrote:
>>
>>      Hi Christian,
>>
>>      We would add separate files for each intrinsic. By splitting the
>>      CompilerDirectives, do you mean we have to add a separate file.
>>      Sorry I didn?t exactly get it.
>>
>> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for 
>> the CompilerDirectives changes and integrate them separately.
>>
>>
>>
>> Thanks,
>>
>> Shravya Rukmannagari.
>>
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R 
>> <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov 
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
>> Rukmannagari, Shravya <shravya.rukmannagari at intel.com 
>> <mailto:shravya.rukmannagari at intel.com>>
>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>> I left this comment in the bug:
>>
>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files 
>> we should put every intrinsic in its own file, like we did for 
>> macroAssembler_x86_sha.cpp. They are already too big:
>>
>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>>       4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>>       3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>>
>> Also, can we split out the CompilerDirectives changes?
>>
>>
>>
>>
>>      On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>>      <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>      wrote:
>>
>>      Hi all
>>
>>      We would like to contribute a patch which optimizestan and log10
>>      X86architecture usingIntel LIBM library.
>>
>>      Could you please review and sponsor this patch.
>>
>>      Bug-id:
>>
>>      https://bugs.openjdk.java.net/browse/JDK-8152907
>>      webrev:
>>
>>
>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0
>> 0
>> /
>>
>>      Thanks and regards,
>>
>>      Vivek
>>

From vladimir.kozlov at oracle.com  Tue Apr  5 21:42:04 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 5 Apr 2016 14:42:04 -0700
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<5704211E.5090007@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com>
	<57042460.5070306@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com>
Message-ID: <5704312C.9000605@oracle.com>

Problem found during build. Looks like we need #include 
"runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp:

hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: 
error: use of undeclared identifier 'SharedRuntime'
__ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)));

Note templateInterpreterGenerator_x86_32.cpp has that #include.

It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used.

Vladimir


On 4/5/16 2:27 PM, Deshpande, Vivek R wrote:
> HI Vladimir
>
> Sorry about that.
> Please check this webrev
> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02/
> I have updated it.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, April 05, 2016 1:47 PM
> To: Deshpande, Vivek R; Rukmannagari, Shravya
> Cc: hotspot compiler
> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>
> I again can't apply changes because of CR at the end of lines in patch file.
>
> Vladimir
>
> On 4/5/16 1:41 PM, Deshpande, Vivek R wrote:
>>
>> Hi Vladimir
>>
>> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed.
>> Thank you for the review.
>>
>> Regards,
>> Vivek
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, April 05, 2016 1:34 PM
>> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya
>> Cc: hotspot compiler
>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them?
>>
>> I will start pre-integration testing.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote:
>>> Hi Christian
>>>
>>> We have updated the patch as per the suggested changes.
>>>
>>> The webrev for the same is at this location for your review.
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0
>>> 1
>>> /
>>>
>>> We will soon send another patch for CompilerDirectives changes.
>>>
>>> Regards,
>>>
>>> Vivek
>>>
>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>> *Sent:* Tuesday, March 29, 2016 11:29 AM
>>> *To:* Rukmannagari, Shravya
>>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
>>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>
>>>       On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>>>       <shravya.rukmannagari at intel.com
>>>       <mailto:shravya.rukmannagari at intel.com>> wrote:
>>>
>>>       Hi Christian,
>>>
>>>       We would add separate files for each intrinsic. By splitting the
>>>       CompilerDirectives, do you mean we have to add a separate file.
>>>       Sorry I didn?t exactly get it.
>>>
>>> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for
>>> the CompilerDirectives changes and integrate them separately.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Shravya Rukmannagari.
>>>
>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R
>>> <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
>>> Rukmannagari, Shravya <shravya.rukmannagari at intel.com
>>> <mailto:shravya.rukmannagari at intel.com>>
>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>
>>> I left this comment in the bug:
>>>
>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files
>>> we should put every intrinsic in its own file, like we did for
>>> macroAssembler_x86_sha.cpp. They are already too big:
>>>
>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>>>        4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>>>        3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>>>
>>> Also, can we split out the CompilerDirectives changes?
>>>
>>>
>>>
>>>
>>>       On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>>>       <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>>       wrote:
>>>
>>>       Hi all
>>>
>>>       We would like to contribute a patch which optimizestan and log10
>>>       X86architecture usingIntel LIBM library.
>>>
>>>       Could you please review and sponsor this patch.
>>>
>>>       Bug-id:
>>>
>>>       https://bugs.openjdk.java.net/browse/JDK-8152907
>>>       webrev:
>>>
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0
>>> 0
>>> /
>>>
>>>       Thanks and regards,
>>>
>>>       Vivek
>>>

From igor.veresov at oracle.com  Tue Apr  5 22:08:03 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Apr 2016 15:08:03 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <57041A5D.4040909@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<AD8F4A4B-AC14-46C5-BBAF-6432FEECA5F4@oracle.com>
	<C43AC2AB-C557-4E25-B7EC-EAAA24B1C38D@oracle.com>
	<570417E6.60107@oracle.com> <5704192A.3010807@oracle.com>
	<57041A5D.4040909@oracle.com>
Message-ID: <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com>

Coleen and Lois,

So, ok to push?

igor


> On Apr 5, 2016, at 1:04 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
> 
> 
> 
> On 4/5/16 3:59 PM, Coleen Phillimore wrote:
>> 
>> 
>> On 4/5/16 3:54 PM, Lois Foltan wrote:
>>> 
>>> On 4/5/2016 1:56 PM, Igor Veresov wrote:
>>>> 
>>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>>>>> 
>>>>>> 
>>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>>>>> Hi Lois,
>>>>>>> 
>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>>>>> 
>>>>>>> igor
>>>>>> Hi Igor,
>>>>>> 
>>>>>> Thanks for waiting on this.  A couple of comments:
>>>>>> 
>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>>>>>> 
>>>>>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>>>>>> 
>>>>> 
>>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction?
>>>> 
>>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be.
>>>> 
>>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code.
>>> 
>>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths.  I think the change is okay.  I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later.
>> 
>> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148
> 
> No, sorry, this change passes the 'tag'.  nvm.
> 
> Coleen
>> 
>> Just waiting for some test changes.
>> 
>> Coleen
>> 
>>> Lois
>>> 
>>>> 
>>>> igor
>>>> 
>>>>> 
>>>>>> Just curious did you also run the testbase default methods tests?
>>>>> 
>>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary.
>>>>> 
>>>>> igor
>>>>> 
>>>>>> Lois
>>>>>> 
>>>>>>> 
>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>>> 
>>>>>>>> Hi Igor,
>>>>>>>> 
>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Lois
>>>>>>>> 
>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>>>>>> 
>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ <http://cr.openjdk.java.net/%7Eiveresov/8153115/webrev.00/>
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> igor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/55086e46/attachment.html>

From lois.foltan at oracle.com  Tue Apr  5 22:12:37 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Tue, 05 Apr 2016 18:12:37 -0400
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<AD8F4A4B-AC14-46C5-BBAF-6432FEECA5F4@oracle.com>
	<C43AC2AB-C557-4E25-B7EC-EAAA24B1C38D@oracle.com>
	<570417E6.60107@oracle.com> <5704192A.3010807@oracle.com>
	<57041A5D.4040909@oracle.com>
	<11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com>
Message-ID: <57043855.1000405@oracle.com>


On 4/5/2016 6:08 PM, Igor Veresov wrote:
> Coleen and Lois,
>
> So, ok to push?
Yes, for me.  Have you addressed all of Karen's concerns as well?
Lois

>
> igor
>
>
>> On Apr 5, 2016, at 1:04 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>
>>
>>
>> On 4/5/16 3:59 PM, Coleen Phillimore wrote:
>>>
>>> On 4/5/16 3:54 PM, Lois Foltan wrote:
>>>> On 4/5/2016 1:56 PM, Igor Veresov wrote:
>>>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>>>>>>
>>>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>>>>>> Hi Lois,
>>>>>>>>
>>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>>>>>>
>>>>>>>> igor
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> Thanks for waiting on this.  A couple of comments:
>>>>>>>
>>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>>>>>>>
>>>>>>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>>>>>>>
>>>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction?
>>>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be.
>>>>>
>>>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code.
>>>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths.  I think the change is okay.  I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later.
>>> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148
>> No, sorry, this change passes the 'tag'.  nvm.
>>
>> Coleen
>>> Just waiting for some test changes.
>>>
>>> Coleen
>>>
>>>> Lois
>>>>
>>>>> igor
>>>>>
>>>>>>> Just curious did you also run the testbase default methods tests?
>>>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary.
>>>>>>
>>>>>> igor
>>>>>>
>>>>>>> Lois
>>>>>>>
>>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Igor,
>>>>>>>>>
>>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Lois
>>>>>>>>>
>>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>>>>>>>
>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ <http://cr.openjdk.java.net/%7Eiveresov/8153115/webrev.00/>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> igor


From igor.veresov at oracle.com  Tue Apr  5 22:16:48 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Apr 2016 15:16:48 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <57043855.1000405@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<AD8F4A4B-AC14-46C5-BBAF-6432FEECA5F4@oracle.com>
	<C43AC2AB-C557-4E25-B7EC-EAAA24B1C38D@oracle.com>
	<570417E6.60107@oracle.com> <5704192A.3010807@oracle.com>
	<57041A5D.4040909@oracle.com>
	<11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com>
	<57043855.1000405@oracle.com>
Message-ID: <EF21336C-D596-4F30-884F-E5B7CF87E411@oracle.com>


> On Apr 5, 2016, at 3:12 PM, Lois Foltan <lois.foltan at oracle.com> wrote:
> 
> 
> On 4/5/2016 6:08 PM, Igor Veresov wrote:
>> Coleen and Lois,
>> 
>> So, ok to push?
> Yes, for me.  Have you addressed all of Karen's concerns as well?

Right.. Karen, is it alright? 

igor

> Lois
> 
>> 
>> igor
>> 
>> 
>>> On Apr 5, 2016, at 1:04 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>> 
>>> 
>>> 
>>> On 4/5/16 3:59 PM, Coleen Phillimore wrote:
>>>> 
>>>> On 4/5/16 3:54 PM, Lois Foltan wrote:
>>>>> On 4/5/2016 1:56 PM, Igor Veresov wrote:
>>>>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>>>>>>> 
>>>>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>>>>>>> Hi Lois,
>>>>>>>>> 
>>>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>>>>>>> 
>>>>>>>>> igor
>>>>>>>> Hi Igor,
>>>>>>>> 
>>>>>>>> Thanks for waiting on this.  A couple of comments:
>>>>>>>> 
>>>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>>>>>>>> 
>>>>>>>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>>>>>>>> 
>>>>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction?
>>>>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be.
>>>>>> 
>>>>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code.
>>>>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths.  I think the change is okay.  I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later.
>>>> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148
>>> No, sorry, this change passes the 'tag'.  nvm.
>>> 
>>> Coleen
>>>> Just waiting for some test changes.
>>>> 
>>>> Coleen
>>>> 
>>>>> Lois
>>>>> 
>>>>>> igor
>>>>>> 
>>>>>>>> Just curious did you also run the testbase default methods tests?
>>>>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary.
>>>>>>> 
>>>>>>> igor
>>>>>>> 
>>>>>>>> Lois
>>>>>>>> 
>>>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Igor,
>>>>>>>>>> 
>>>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Lois
>>>>>>>>>> 
>>>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>>>>>>>> 
>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ <http://cr.openjdk.java.net/%7Eiveresov/8153115/webrev.00/>
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> igor
> 


From karen.kinnear at oracle.com  Tue Apr  5 22:33:17 2016
From: karen.kinnear at oracle.com (Karen Kinnear)
Date: Tue, 5 Apr 2016 18:33:17 -0400
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<E1FCFC2E-0D1F-4191-9F8B-DA1BF8817F4A@oracle.com>
	<1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com>
Message-ID: <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com>

Igor,

Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter
for instance?

If so, I am ok with checking this in - further notes below.

> On Apr 5, 2016, at 3:43 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> 
>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear <karen.kinnear at oracle.com <mailto:karen.kinnear at oracle.com>> wrote:
>> 
>> I am in agreement with Lois that the JVMS looks good with moving the exception.
> 
> Thanks!
>> 
>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next
>> meeting I will check one more time. It might be worth adding a comment.
> 
> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/>
> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle.
> 
>> 
>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks
>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null.
>> 
> 
> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 <https://bugs.openjdk.java.net/browse/JDK-8152903>
> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp).

Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match
the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup.
That is ok with me - I will add a note to the bug.

Also: I see a ciMethod::check_call that has a comment - 
IT appears to fail when applied to an invoke interface call site.
FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. 

Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take
the subtleties of invoke interface and invoke special into account. 
> 
> igor
> 
>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the
>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code,
>> so that you get the correct behavior depending on the requesting byte code.
>> 
>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so
>> I could use help studying this a bit more to understand if this really is resolution or is really selection.
>> 
>> thanks,
>> Karen
>> 
>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>> 
>>> 
>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>> Hi Lois,
>>>> 
>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>> 
>>>> igor
>>> Hi Igor,
>>> 
>>> Thanks for waiting on this.  A couple of comments:
>>> 
>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>>> 
>>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>>> 
>>> Just curious did you also run the testbase default methods tests?
>>> Lois
>>> 
>>>> 
>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>> 
>>>>> Hi Igor,
>>>>> 
>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>>> 
>>>>> Thanks,
>>>>> Lois
>>>>> 
>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>>> 
>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 <https://bugs.openjdk.java.net/browse/JDK-8153115>
>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/>
>>>>>> 
>>>>>> Thanks,
>>>>>> igor
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/9049c559/attachment.html>

From igor.veresov at oracle.com  Tue Apr  5 23:22:37 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Apr 2016 16:22:37 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<E1FCFC2E-0D1F-4191-9F8B-DA1BF8817F4A@oracle.com>
	<1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com>
	<4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com>
Message-ID: <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com>


> On Apr 5, 2016, at 3:33 PM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
> 
> Igor,
> 
> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter
> for instance?

Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed.

> 
> If so, I am ok with checking this in - further notes below.
> 
>> On Apr 5, 2016, at 3:43 PM, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>> 
>> 
>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear <karen.kinnear at oracle.com <mailto:karen.kinnear at oracle.com>> wrote:
>>> 
>>> I am in agreement with Lois that the JVMS looks good with moving the exception.
>> 
>> Thanks!
>>> 
>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next
>>> meeting I will check one more time. It might be worth adding a comment.
>> 
>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/>
>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle.
>> 
>>> 
>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks
>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null.
>>> 
>> 
>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 <https://bugs.openjdk.java.net/browse/JDK-8152903>
>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp).
> 
> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match
> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup.
> That is ok with me - I will add a note to the bug.

Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa?

> 
> Also: I see a ciMethod::check_call that has a comment - 
> IT appears to fail when applied to an invoke interface call site.
> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. 
> 

This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas?

igor

> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take
> the subtleties of invoke interface and invoke special into account. 
>> 
>> igor
>> 
>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the
>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code,
>>> so that you get the correct behavior depending on the requesting byte code.
>>> 
>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so
>>> I could use help studying this a bit more to understand if this really is resolution or is really selection.
>>> 
>>> thanks,
>>> Karen
>>> 
>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>> 
>>>> 
>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>>> Hi Lois,
>>>>> 
>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>>> 
>>>>> igor
>>>> Hi Igor,
>>>> 
>>>> Thanks for waiting on this.  A couple of comments:
>>>> 
>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>>>> 
>>>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>>>> 
>>>> Just curious did you also run the testbase default methods tests?
>>>> Lois
>>>> 
>>>>> 
>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>> 
>>>>>> Hi Igor,
>>>>>> 
>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>>>> 
>>>>>> Thanks,
>>>>>> Lois
>>>>>> 
>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>>>> 
>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 <https://bugs.openjdk.java.net/browse/JDK-8153115>
>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/>
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> igor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/3b7863fb/attachment-0001.html>

From christian.thalinger at oracle.com  Tue Apr  5 23:52:57 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 5 Apr 2016 13:52:57 -1000
Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an
	nmethod
In-Reply-To: <C912108E-3B91-4C44-A8CA-A04F3FB41387@oracle.com>
References: <EF47A3DB-06DD-4AC3-AA3E-18F7FFB1BF88@oracle.com>
	<8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com>
	<3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com>
	<F2652198-15CC-4C2E-9E92-28594F5D4F0E@oracle.com>
	<C912108E-3B91-4C44-A8CA-A04F3FB41387@oracle.com>
Message-ID: <CEAD6FB6-0D44-4B24-A22E-CDD7A880380E@oracle.com>


> On Apr 5, 2016, at 4:16 AM, Doug Simon <doug.simon at oracle.com> wrote:
> 
>> 
>> On 05 Apr 2016, at 02:00, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>> 
>> 
>>> On Apr 4, 2016, at 12:34 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>>> 
>>> No, not good.  We are failing a couple JVMCI tests.  Looking into it?
>> 
>> Ok, this got a little out of control but for the better:
>> 
>> http://cr.openjdk.java.net/~twisti/8153439/webrev.01/
>> 
>> The actual fix is to check for a null log argument.  The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE.  This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions.
>> 
>> While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone.  Again, stupid jtreg.
>> 
>> Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog.  I think it should.
> 
> I agree. Can you modify your derivative webrev for that? Of course, we wouldn?t need the cast to HotSpotSpeculationLog in HotSpotCodeCacheProvider once you?ve made that change.

Sure.  Here is the new webrev:

http://cr.openjdk.java.net/~twisti/8153439/webrev.02/

> 
> -Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/a49dfc1a/attachment.html>

From christian.thalinger at oracle.com  Wed Apr  6 01:37:30 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 5 Apr 2016 15:37:30 -1000
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
Message-ID: <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com>


> On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com> wrote:
> 
> Hi Christian
>  
> We have updated the patch as per the suggested changes.
> The webrev for the same is at this location for your review.
> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ <http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/>
There are:
  73 #ifdef _LP64
 368 #else
 655 #endif
in the new files but I don?t see them share any code.  Maybe it would be better to have dedicated x86_32 and x86_64 files.  Then the ifdefs are not required.

>  
> We will soon send another patch for CompilerDirectives changes.
>  
> Regards,
> Vivek
>  
> From: Christian Thalinger [mailto:christian.thalinger at oracle.com] 
> Sent: Tuesday, March 29, 2016 11:29 AM
> To: Rukmannagari, Shravya
> Cc: Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>  
>  
> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya <shravya.rukmannagari at intel.com <mailto:shravya.rukmannagari at intel.com>> wrote:
>  
> Hi Christian,
> We would add separate files for each intrinsic. By splitting the CompilerDirectives, do you mean we have to add a separate file. Sorry I didn?t exactly get it.
>  
> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for the CompilerDirectives changes and integrate them separately.
> 
> 
>  
> Thanks,
> Shravya Rukmannagari.
> ? <>
> From: Christian Thalinger [mailto:christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>] 
> Sent: Monday, March 28, 2016 5:18 PM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>; Rukmannagari, Shravya <shravya.rukmannagari at intel.com <mailto:shravya.rukmannagari at intel.com>>
> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>  
> I left this comment in the bug:
>  
> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we should put every intrinsic in its own file, like we did for macroAssembler_x86_sha.cpp. They are already too big: 
> 
> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp 
>     4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp 
>     3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp 
>  
> Also, can we split out the CompilerDirectives changes?
> 
> 
> 
> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>> wrote:
>  
> Hi all
>  
> We would like to contribute a patch which optimizes tan and log10 X86 architecture using Intel LIBM library.
> Could you please review and sponsor this patch.
>  
> Bug-id:
> https://bugs.openjdk.java.net/browse/JDK-8152907 <https://bugs.openjdk.java.net/browse/JDK-8152907>
> webrev:
> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ <http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/>
>  
> Thanks and regards,
> Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160405/528dc7b3/attachment-0001.html>

From vladimir.kozlov at oracle.com  Wed Apr  6 01:46:44 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 5 Apr 2016 18:46:44 -0700
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com>
Message-ID: <57046A84.6040707@oracle.com>

Multiple files are not always good. May be in a future we can rewrite 
this code to use shared parts (code or data). I think current split is 
enough for these changes.

Thanks,
Vladimir

On 4/5/16 6:37 PM, Christian Thalinger wrote:
>
>> On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R
>> <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>> wrote:
>>
>> Hi Christian
>> We have updated the patch as per the suggested changes.
>> The webrev for the same is at this location for your review.
>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/
>
> There are:
>
>    73 #ifdef _LP64
>
>   368 #else
>
>   655 #endif
>
> in the new files but I don?t see them share any code.  Maybe it would be
> better to have dedicated x86_32 and x86_64 files.  Then the ifdefs are
> not required.
>
>> We will soon send another patch for CompilerDirectives changes.
>> Regards,
>> Vivek
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:*Tuesday, March 29, 2016 11:29 AM
>> *To:*Rukmannagari, Shravya
>> *Cc:*Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>>     On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>>     <shravya.rukmannagari at intel.com
>>     <mailto:shravya.rukmannagari at intel.com>> wrote:
>>     Hi Christian,
>>     We would add separate files for each intrinsic. By splitting the
>>     CompilerDirectives, do you mean we have to add a separate file.
>>     Sorry I didn?t exactly get it.
>>
>> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for
>> the CompilerDirectives changes and integrate them separately.
>>
>>
>> Thanks,
>> Shravya Rukmannagari.
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:*Monday, March 28, 2016 5:18 PM
>> *To:*Deshpande, Vivek R <vivek.r.deshpande at intel.com
>> <mailto:vivek.r.deshpande at intel.com>>
>> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
>> Rukmannagari, Shravya <shravya.rukmannagari at intel.com
>> <mailto:shravya.rukmannagari at intel.com>>
>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>> I left this comment in the bug:
>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we
>> should put every intrinsic in its own file, like we did for
>> macroAssembler_x86_sha.cpp. They are already too big:
>>
>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>>     4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>>     3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>> Also, can we split out the CompilerDirectives changes?
>>
>>
>>
>>     On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>>     <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>     wrote:
>>     Hi all
>>     We would like to contribute a patch which optimizestan and log10
>>     X86architecture usingIntel LIBM library.
>>     Could you please review and sponsor this patch.
>>     Bug-id:
>>     https://bugs.openjdk.java.net/browse/JDK-8152907
>>     webrev:
>>     http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/
>>     Thanks and regards,
>>     Vivek
>

From jamsheed.c.m at oracle.com  Wed Apr  6 08:10:52 2016
From: jamsheed.c.m at oracle.com (Jamsheed C m)
Date: Wed, 6 Apr 2016 13:40:52 +0530
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
Message-ID: <5704C48C.2070502@oracle.com>

Thanks for the reply. trying to understand stuffs.

> void nmethod::add_handler_for_exception_and_pc(Handle exception, 
> address pc, address handler) {
>   // There are potential race conditions during exception cache 
> updates, so we
>   // must own the ExceptionCache_lock before doing ANY modifications. 
> Because
>   // we don't lock during reads, it is possible to have several 
> threads attempt
>   // to update the cache with the same data. We need to check for 
> already inserted
>   // copies of the current data before adding it.
>
>   MutexLocker ml(ExceptionCache_lock);
>   ExceptionCache* target_entry = 
> exception_cache_entry_for_exception(exception);
>
>   if (target_entry == NULL || 
> !target_entry->add_address_and_handler(pc,handler)) {
>     target_entry = new ExceptionCache(exception,pc,handler);
>     add_exception_cache_entry(target_entry);
>   }
> }

[1]there is a storestore mem barrier before count is updated in 
add_address_and_handler
this ensure exception pc and handler address are updated before count is 
incremented  and  Exception cache entry is updated at ( 
nm->_exception_cache or in the list ec->_next ).

> address nmethod::handler_for_exception_and_pc(Handle exception, 
> address pc) {
>   // We never grab a lock to read the exception cache, so we may
>   // have false negatives. This is okay, as it can only happen during
>   // the first few exception lookups for a given nmethod.
>   ExceptionCache* ec = exception_cache();
>   while (ec != NULL) {
>     address ret_val;
>     if ((ret_val = ec->match(exception,pc)) != NULL) {
>       return ret_val;
>     }
>     ec = ec->next();
>   }
>   return NULL;
> }

and in read logic. we first check ec entry is available (non null check) 
before proceeding further.
if ec is non null and ec_type,excpetion pc, and handler are available 
by[1]. though count can be reordered and not updated with new value.

this fixes the issue. why you think it doesn't?

Best Regards,
Jamsheed


On 4/5/2016 3:40 PM, Doerr, Martin wrote:
>
> Hi Jamsheed,
>
> thanks for pointing me to it. Interesting that you have found such a 
> problem so shortly before me J
>
> My webrev addresses some aspects which are not covered by your fix:
>
> -add_handler_for_exception_and_pc adds a new ExceptionCache instance 
> in the other case. They need to get released as well.
>
> -The readers of the _exception_cache field are not safe, yet. As 
> Andrew Haley pointed out, optimizers may modify load accesses for 
> non-volatile fields.
>
> So I think my change is still needed.
>
> And after taking a closer look at your change, I think the _count 
> field which is addressed by your fix needs to be volatile as well. I 
> can incorporate that in my change if you like.
>
> Would you agree?
>
> Best regards,
>
> Martin
>
> *From:*hotspot-compiler-dev 
> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of 
> *Jamsheed C m
> *Sent:* Montag, 4. April 2016 08:14
> *To:* hotspot-compiler-dev at openjdk.java.net
> *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not 
> multi-thread safe
>
> Hi Martin,
>
> "nmethod's exception cache not multi-thread safe"  bug is fixed in b107
> bug id: https://bugs.openjdk.java.net/browse/JDK-8143897
> fix changeset: 
> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9
> discussion link: 
> http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html
>
> Best Regards,
> Jamsheed
>
> On 4/1/2016 6:07 PM, Doerr, Martin wrote:
>
>     Hello everyone,
>
>     we have found a concurrency problem with the nmethod?s exception
>     cache. Readers of the cache may read stale data on weak memory
>     platforms.
>
>     The writers of the cache are synchronized by locks, but there may
>     be concurrent readers: The compiler runtimes use
>     nmethod::handler_for_exception_and_pc to access the cache without
>     locking.
>
>     Therefore, the nmethod's field _exception_cache needs to be
>     volatile and adding new entries must be done by releasing stores.
>     (Loading seems to be fine without acquire because there's an
>     address dependency from the load of the cache to the usage of its
>     contents which is sufficient to ensure ordering on all openjdk
>     platforms.)
>
>     I also added a minor cleanup: I changed nmethod::is_alive to read
>     the volatile field _state only once. It is certainly undesired to
>     force the compiler to load it from memory twice.
>
>     Webrev is here:
>
>     http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/
>     <http://cr.openjdk.java.net/%7Emdoerr/8153267_exception_cache/webrev.00/>
>
>     Please review. I will also need a sponsor.
>
>     Best regards,
>
>     Martin
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160406/0fa4f488/attachment-0001.html>

From martin.doerr at sap.com  Wed Apr  6 09:19:26 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 6 Apr 2016 09:19:26 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <5704C48C.2070502@oracle.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
Message-ID: <b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>

Hi Jamsheed,

here are the cases of add_handler_for_exception_and_pc we should talk about:

Case 1: A new ExceptionCache instance needs to get added.
The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below.
The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL.
One could argue that this is not critical, but I guess this was not intended?

At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do).
I think releasing the completely initialized ExceptionCache instance is a much cleaner design.

Case 2: An existing ExceptionCache instance gets a new entry.
In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths.

I have added the acquire barrier for the _count field here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/

Does this answer your questions or is anything still unclear?

Best regards,
Martin


From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
Sent: Mittwoch, 6. April 2016 10:11
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Thanks for the reply. trying to understand stuffs.


void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) {
  // There are potential race conditions during exception cache updates, so we
  // must own the ExceptionCache_lock before doing ANY modifications. Because
  // we don't lock during reads, it is possible to have several threads attempt
  // to update the cache with the same data. We need to check for already inserted
  // copies of the current data before adding it.

  MutexLocker ml(ExceptionCache_lock);
  ExceptionCache* target_entry = exception_cache_entry_for_exception(exception);

  if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) {
    target_entry = new ExceptionCache(exception,pc,handler);
    add_exception_cache_entry(target_entry);
  }
}

[1]there is a storestore mem barrier before count is updated in  add_address_and_handler
this ensure exception pc and handler address are updated before count is incremented  and  Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ).


address nmethod::handler_for_exception_and_pc(Handle exception, address pc) {
  // We never grab a lock to read the exception cache, so we may
  // have false negatives. This is okay, as it can only happen during
  // the first few exception lookups for a given nmethod.
  ExceptionCache* ec = exception_cache();
  while (ec != NULL) {
    address ret_val;
    if ((ret_val = ec->match(exception,pc)) != NULL) {
      return ret_val;
    }
    ec = ec->next();
  }
  return NULL;
}

and in read logic. we first check ec entry is available (non null check) before proceeding further.
if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value.

this fixes the issue. why you think it doesn't?

Best Regards,
Jamsheed

On 4/5/2016 3:40 PM, Doerr, Martin wrote:
Hi Jamsheed,

thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :)

My webrev addresses some aspects which are not covered by your fix:

-          add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well.

-          The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields.

So I think my change is still needed.

And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like.
Would you agree?

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m
Sent: Montag, 4. April 2016 08:14
To: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Hi Martin,

"nmethod's exception cache not multi-thread safe"  bug is fixed in b107
bug id: https://bugs.openjdk.java.net/browse/JDK-8143897
fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9
discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html

Best Regards,
Jamsheed
On 4/1/2016 6:07 PM, Doerr, Martin wrote:
Hello everyone,

we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms.

The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking.
Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.)

I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/

Please review. I will also need a sponsor.

Best regards,
Martin


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160406/709889ad/attachment.html>

From tobias.hartmann at oracle.com  Wed Apr  6 10:53:14 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 6 Apr 2016 12:53:14 +0200
Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of <clinit>
Message-ID: <5704EA9A.7020202@oracle.com>

Hi,

please review the following patch that adds support to the Whitebox API to compile the static initializer of a class:

https://bugs.openjdk.java.net/browse/JDK-8153514
http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/
http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/

The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature.

I did not add tests for the new feature because that would require implementing additional methods to check for a successful <clinit> compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing.

Thanks,
Tobias

From zoltan.majo at oracle.com  Wed Apr  6 10:59:08 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 6 Apr 2016 12:59:08 +0200
Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of
	<clinit>
In-Reply-To: <5704EA9A.7020202@oracle.com>
References: <5704EA9A.7020202@oracle.com>
Message-ID: <5704EBFC.5010804@oracle.com>

Hi Tobias,


that looks good to me!

Best regards,


Zoltan


On 04/06/2016 12:53 PM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class:
>
> https://bugs.openjdk.java.net/browse/JDK-8153514
> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/
> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/
>
> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature.
>
> I did not add tests for the new feature because that would require implementing additional methods to check for a successful <clinit> compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing.
>
> Thanks,
> Tobias


From tobias.hartmann at oracle.com  Wed Apr  6 11:05:52 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 6 Apr 2016 13:05:52 +0200
Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of
	<clinit>
In-Reply-To: <5704EBFC.5010804@oracle.com>
References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com>
Message-ID: <5704ED90.6050803@oracle.com>

Thanks, Zoltan!

Best regards,
Tobias

On 06.04.2016 12:59, Zolt?n Maj? wrote:
> Hi Tobias,
> 
> 
> that looks good to me!
> 
> Best regards,
> 
> 
> Zoltan
> 
> 
> On 04/06/2016 12:53 PM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153514
>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/
>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/
>>
>> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature.
>>
>> I did not add tests for the new feature because that would require implementing additional methods to check for a successful <clinit> compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing.
>>
>> Thanks,
>> Tobias
> 

From igor.ignatyev at oracle.com  Wed Apr  6 11:38:42 2016
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 6 Apr 2016 14:38:42 +0300
Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of
	<clinit>
In-Reply-To: <5704ED90.6050803@oracle.com>
References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com>
	<5704ED90.6050803@oracle.com>
Message-ID: <AE677A4A-0327-4D07-ACD4-69A920AEAC1F@oracle.com>

Hi Tobias,

looks good to me, thanks for implementing this.

Thanks,
? Igor
> On Apr 6, 2016, at 2:05 PM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Thanks, Zoltan!
> 
> Best regards,
> Tobias
> 
> On 06.04.2016 12:59, Zolt?n Maj? wrote:
>> Hi Tobias,
>> 
>> 
>> that looks good to me!
>> 
>> Best regards,
>> 
>> 
>> Zoltan
>> 
>> 
>> On 04/06/2016 12:53 PM, Tobias Hartmann wrote:
>>> Hi,
>>> 
>>> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class:
>>> 
>>> https://bugs.openjdk.java.net/browse/JDK-8153514
>>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/
>>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/
>>> 
>>> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature.
>>> 
>>> I did not add tests for the new feature because that would require implementing additional methods to check for a successful <clinit> compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing.
>>> 
>>> Thanks,
>>> Tobias
>> 


From tobias.hartmann at oracle.com  Wed Apr  6 11:39:42 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 6 Apr 2016 13:39:42 +0200
Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of
	<clinit>
In-Reply-To: <AE677A4A-0327-4D07-ACD4-69A920AEAC1F@oracle.com>
References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com>
	<5704ED90.6050803@oracle.com>
	<AE677A4A-0327-4D07-ACD4-69A920AEAC1F@oracle.com>
Message-ID: <5704F57E.5000505@oracle.com>

Thanks, Igor!

Best regards,
Tobias

On 06.04.2016 13:38, Igor Ignatyev wrote:
> Hi Tobias,
> 
> looks good to me, thanks for implementing this.
> 
> Thanks,
> ? Igor
>> On Apr 6, 2016, at 2:05 PM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
>>
>> Thanks, Zoltan!
>>
>> Best regards,
>> Tobias
>>
>> On 06.04.2016 12:59, Zolt?n Maj? wrote:
>>> Hi Tobias,
>>>
>>>
>>> that looks good to me!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>
>>> On 04/06/2016 12:53 PM, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8153514
>>>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/
>>>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/
>>>>
>>>> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature.
>>>>
>>>> I did not add tests for the new feature because that would require implementing additional methods to check for a successful <clinit> compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing.
>>>>
>>>> Thanks,
>>>> Tobias
>>>
> 

From jamsheed.c.m at oracle.com  Wed Apr  6 11:54:02 2016
From: jamsheed.c.m at oracle.com (Jamsheed C m)
Date: Wed, 6 Apr 2016 17:24:02 +0530
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
Message-ID: <5704F8DA.9030000@oracle.com>

Hi Martin,

On 4/6/2016 2:49 PM, Doerr, Martin wrote:
>
> Hi Jamsheed,
>
> here are the cases of add_handler_for_exception_and_pc we should talk 
> about:
>
> Case 1: A new ExceptionCache instance needs to get added.
>
> The storestore barrier you have added is used in the constructor of 
> the ExceptionCache and it releases the most critical fields of it. I 
> think this is what you explained in [1] in your email below.
>
> The new values of _count and _next fields are written afterwards and 
> hence not covered by this release barrier. Readers of the 
> _exception_cache may read _count==0 or _next==NULL.
>
> One could argue that this is not critical, but I guess this was not 
> intended?
>
> At least the _exception_cache field needs to be volatile to prevent 
> optimizers from breaking anything. This is always needed for fields 
> which are accessed concurrently by multiple threads without locks (as 
> the readers do).
>
> I think releasing the completely initialized ExceptionCache instance 
> is a much cleaner design.
>
Having count < actual entries, or having _next = null is OK (as there is 
always (locked)slow path to check again).
Quoting comment from read path.
>   // We never grab a lock to read the exception cache, so we may
>   // have false negatives. This is okay, as it can only happen during
>   // the first few exception lookups for a given nmethod.
Weak memory platforms may have a few more false negatives. but isn't 
that OK ?
This helps us, as we can remove volatile from picture, and actually good 
for read paths.

> Case 2: An existing ExceptionCache instance gets a new entry.
>
> In this case your storestore barrier is good to release all updated 
> fields. However, we need to consider the readers, too. The _count 
> field needs to be volatile and the load must acquire. Otherwise, stale 
> data may get read by processors which perform loads on speculative paths.
>
storestore mem barrier handles this,  as count <= no of real entries. 
and there is always locked slow path to check again.
As said before, there may be a few more false negatives in weak memory 
platforms than strong ones.

Best Regards,
Jamsheed

> I have added the acquire barrier for the _count field here:
>
> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ 
> <http://cr.openjdk.java.net/%7Emdoerr/8153267_exception_cache/webrev.01/>
>
> Does this answer your questions or is anything still unclear?
>
> Best regards,
>
> Martin
>
> *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
> *Sent:* Mittwoch, 6. April 2016 10:11
> *To:* Doerr, Martin <martin.doerr at sap.com>; 
> hotspot-compiler-dev at openjdk.java.net
> *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not 
> multi-thread safe
>
> Thanks for the reply. trying to understand stuffs.
>
>
>     void nmethod::add_handler_for_exception_and_pc(Handle exception,
>     address pc, address handler) {
>       // There are potential race conditions during exception cache
>     updates, so we
>       // must own the ExceptionCache_lock before doing ANY
>     modifications. Because
>       // we don't lock during reads, it is possible to have several
>     threads attempt
>       // to update the cache with the same data. We need to check for
>     already inserted
>       // copies of the current data before adding it.
>
>       MutexLocker ml(ExceptionCache_lock);
>       ExceptionCache* target_entry =
>     exception_cache_entry_for_exception(exception);
>
>       if (target_entry == NULL ||
>     !target_entry->add_address_and_handler(pc,handler)) {
>         target_entry = new ExceptionCache(exception,pc,handler);
>         add_exception_cache_entry(target_entry);
>       }
>     }
>
>
> [1]there is a storestore mem barrier before count is updated in  
> add_address_and_handler
> this ensure exception pc and handler address are updated before count 
> is incremented  and  Exception cache entry is updated at ( 
> nm->_exception_cache or in the list ec->_next ).
>
>
>     address nmethod::handler_for_exception_and_pc(Handle exception,
>     address pc) {
>       // We never grab a lock to read the exception cache, so we may
>       // have false negatives. This is okay, as it can only happen during
>       // the first few exception lookups for a given nmethod.
>       ExceptionCache* ec = exception_cache();
>       while (ec != NULL) {
>         address ret_val;
>         if ((ret_val = ec->match(exception,pc)) != NULL) {
>           return ret_val;
>         }
>         ec = ec->next();
>       }
>       return NULL;
>     }
>
>
> and in read logic. we first check ec entry is available (non null 
> check) before proceeding further.
> if ec is non null and ec_type,excpetion pc, and handler are available 
> by[1]. though count can be reordered and not updated with new value.
>
> this fixes the issue. why you think it doesn't?
>
> Best Regards,
> Jamsheed
>
> On 4/5/2016 3:40 PM, Doerr, Martin wrote:
>
>     Hi Jamsheed,
>
>     thanks for pointing me to it. Interesting that you have found such
>     a problem so shortly before me J
>
>     My webrev addresses some aspects which are not covered by your fix:
>
>     -add_handler_for_exception_and_pc adds a new ExceptionCache
>     instance in the other case. They need to get released as well.
>
>     -The readers of the _exception_cache field are not safe, yet. As
>     Andrew Haley pointed out, optimizers may modify load accesses for
>     non-volatile fields.
>
>     So I think my change is still needed.
>
>     And after taking a closer look at your change, I think the _count
>     field which is addressed by your fix needs to be volatile as well.
>     I can incorporate that in my change if you like.
>
>     Would you agree?
>
>     Best regards,
>
>     Martin
>
>     *From:*hotspot-compiler-dev
>     [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf
>     Of *Jamsheed C m
>     *Sent:* Montag, 4. April 2016 08:14
>     *To:* hotspot-compiler-dev at openjdk.java.net
>     <mailto:hotspot-compiler-dev at openjdk.java.net>
>     *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not
>     multi-thread safe
>
>     Hi Martin,
>
>     "nmethod's exception cache not multi-thread safe"  bug is fixed in
>     b107
>     bug id: https://bugs.openjdk.java.net/browse/JDK-8143897
>     fix changeset:
>     http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9
>     discussion link:
>     http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html
>
>     Best Regards,
>     Jamsheed
>
>     On 4/1/2016 6:07 PM, Doerr, Martin wrote:
>
>         Hello everyone,
>
>         we have found a concurrency problem with the nmethod?s
>         exception cache. Readers of the cache may read stale data on
>         weak memory platforms.
>
>         The writers of the cache are synchronized by locks, but there
>         may be concurrent readers: The compiler runtimes use
>         nmethod::handler_for_exception_and_pc to access the cache
>         without locking.
>
>         Therefore, the nmethod's field _exception_cache needs to be
>         volatile and adding new entries must be done by releasing
>         stores. (Loading seems to be fine without acquire because
>         there's an address dependency from the load of the cache to
>         the usage of its contents which is sufficient to ensure
>         ordering on all openjdk platforms.)
>
>         I also added a minor cleanup: I changed nmethod::is_alive to
>         read the volatile field _state only once. It is certainly
>         undesired to force the compiler to load it from memory twice.
>
>         Webrev is here:
>
>         http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/
>         <http://cr.openjdk.java.net/%7Emdoerr/8153267_exception_cache/webrev.00/>
>
>         Please review. I will also need a sponsor.
>
>         Best regards,
>
>         Martin
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160406/fc9338c9/attachment-0001.html>

From martin.doerr at sap.com  Wed Apr  6 13:24:54 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 6 Apr 2016 13:24:54 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <5704F8DA.9030000@oracle.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
Message-ID: <a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>

Hi Jamsheed and all,

thanks for your explanation.

About Case 1:
I basically agree with that reading _next==NULL or _count==0 only leads to false negatives and is not critical.
Yes, we could live with a few more false negatives on weak memory model platforms (even though this is not my preferred design).

About Case 2:
What I'm missing on the reader's side of the _count field is something which prevents processors from speculatively loading the contents of the ExceptionCache.

In ExceptionCache::test_address, the _count only affects the control flow.
PPC and ARM processors can predict branches which depend on the _count field and load speculatively from the pc and handler fields (which may be stale data!).
Due to out-of-order execution of the loads, it can actually happen, that the new _count value is observed, but stale data is read from pc and handler fields.
I guess it is highly unlikely that we will ever observe this, but there's no guarantee.

I think my concern about using non-volatile fields for the _exception_cache is also still valid.
Nothing prevents C++ Compilers from loading the pointer twice from memory. They may expect to get the pointer to the same instance both times but actually get two different ones.
For example, this may lead to the situation that handler_for_exception_and_pc uses one ExceptionCache instance for calling the match function and another one (du to reload of non-volatile field) for calling next().

May other people would like to comment on this lengthy discussion as well?

Best regards,
Martin


From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
Sent: Mittwoch, 6. April 2016 13:54
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Hi Martin,

On 4/6/2016 2:49 PM, Doerr, Martin wrote:

Hi Jamsheed,

here are the cases of add_handler_for_exception_and_pc we should talk about:

Case 1: A new ExceptionCache instance needs to get added.
The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below.
The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL.
One could argue that this is not critical, but I guess this was not intended?

At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do).
I think releasing the completely initialized ExceptionCache instance is a much cleaner design.
Having count < actual entries, or having _next = null is OK (as there is always (locked)slow path to check again).
Quoting comment from read path.

  // We never grab a lock to read the exception cache, so we may
  // have false negatives. This is okay, as it can only happen during
  // the first few exception lookups for a given nmethod.
Weak memory platforms may have a few more false negatives. but isn't that OK ?
This helps us, as we can remove volatile from picture, and actually good for read paths.


Case 2: An existing ExceptionCache instance gets a new entry.
In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths.
storestore mem barrier handles this,  as count <= no of real entries. and there is always locked slow path to check again.
As said before, there may be a few more false negatives in weak memory platforms than strong ones.

Best Regards,
Jamsheed


I have added the acquire barrier for the _count field here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/

Does this answer your questions or is anything still unclear?

Best regards,
Martin


From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
Sent: Mittwoch, 6. April 2016 10:11
To: Doerr, Martin <martin.doerr at sap.com><mailto:martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Thanks for the reply. trying to understand stuffs.


void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) {
  // There are potential race conditions during exception cache updates, so we
  // must own the ExceptionCache_lock before doing ANY modifications. Because
  // we don't lock during reads, it is possible to have several threads attempt
  // to update the cache with the same data. We need to check for already inserted
  // copies of the current data before adding it.

  MutexLocker ml(ExceptionCache_lock);
  ExceptionCache* target_entry = exception_cache_entry_for_exception(exception);

  if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) {
    target_entry = new ExceptionCache(exception,pc,handler);
    add_exception_cache_entry(target_entry);
  }
}

[1]there is a storestore mem barrier before count is updated in  add_address_and_handler
this ensure exception pc and handler address are updated before count is incremented  and  Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ).


address nmethod::handler_for_exception_and_pc(Handle exception, address pc) {
  // We never grab a lock to read the exception cache, so we may
  // have false negatives. This is okay, as it can only happen during
  // the first few exception lookups for a given nmethod.
  ExceptionCache* ec = exception_cache();
  while (ec != NULL) {
    address ret_val;
    if ((ret_val = ec->match(exception,pc)) != NULL) {
      return ret_val;
    }
    ec = ec->next();
  }
  return NULL;
}

and in read logic. we first check ec entry is available (non null check) before proceeding further.
if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value.

this fixes the issue. why you think it doesn't?

Best Regards,
Jamsheed


On 4/5/2016 3:40 PM, Doerr, Martin wrote:
Hi Jamsheed,

thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :)

My webrev addresses some aspects which are not covered by your fix:

-          add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well.

-          The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields.

So I think my change is still needed.

And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like.
Would you agree?

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m
Sent: Montag, 4. April 2016 08:14
To: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Hi Martin,

"nmethod's exception cache not multi-thread safe"  bug is fixed in b107
bug id: https://bugs.openjdk.java.net/browse/JDK-8143897
fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9
discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html

Best Regards,
Jamsheed
On 4/1/2016 6:07 PM, Doerr, Martin wrote:
Hello everyone,

we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms.

The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking.
Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.)

I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/

Please review. I will also need a sponsor.

Best regards,
Martin


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160406/ef0e2353/attachment-0001.html>

From aph at redhat.com  Wed Apr  6 16:01:19 2016
From: aph at redhat.com (Andrew Haley)
Date: Wed, 6 Apr 2016 17:01:19 +0100
Subject: [aarch64-port-dev ] RFR: aarch64: Add Arrays.fill stub code
In-Reply-To: <CADwSXq8Vk4Sccy_ZLK+GR-hjaw+RCVsKovErv6UO_t7UMS857w@mail.gmail.com>
References: <CADwSXq8Vk4Sccy_ZLK+GR-hjaw+RCVsKovErv6UO_t7UMS857w@mail.gmail.com>
Message-ID: <570532CF.1050903@redhat.com>

On 04/06/2016 01:51 PM, Long Chen wrote:

> Please review this patch for generating stub code for ArrayFill on aarch64
> platform.
> 
> Performance test case:
> http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.java
> Testing result: http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.html
> Patch: http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.patch
> 
> At same time, refactoring ClearArrayNode's code generation, as it can be
> used by Array fill too.

Looks good.  Minor nit:

+  // Generate stub for disjoint fill. If "aligned" is true, the
+  // "to" address is assumed to be heapword aligned.

"disjoint" doesn't make any sense here.

> Following up I would like to propose a patch to use DC ZVA for large array
> zeroing.

OK.  I haven't seen much advantage to using DC ZVA, but I'm
prepared to listen.

Andrew.


From vladimir.kozlov at oracle.com  Wed Apr  6 16:50:34 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 6 Apr 2016 09:50:34 -0700
Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of
	<clinit>
In-Reply-To: <5704EA9A.7020202@oracle.com>
References: <5704EA9A.7020202@oracle.com>
Message-ID: <57053E5A.6050705@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/6/16 3:53 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class:
>
> https://bugs.openjdk.java.net/browse/JDK-8153514
> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/
> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/
>
> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature.
>
> I did not add tests for the new feature because that would require implementing additional methods to check for a successful <clinit> compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing.
>
> Thanks,
> Tobias
>

From vivek.r.deshpande at intel.com  Wed Apr  6 17:14:58 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Wed, 6 Apr 2016 17:14:58 +0000
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <5704312C.9000605@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<5704211E.5090007@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com>
	<57042460.5070306@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com>
	<5704312C.9000605@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com>

Hi Vladimir

Please let me know, if I need to provide an updated patch with this change.
Thanks for all your help.

Regards,
Vivek

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Tuesday, April 05, 2016 2:42 PM
To: Deshpande, Vivek R; Rukmannagari, Shravya
Cc: hotspot compiler
Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86

Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp:

hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: 
error: use of undeclared identifier 'SharedRuntime'
__ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)));

Note templateInterpreterGenerator_x86_32.cpp has that #include.

It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used.

Vladimir


On 4/5/16 2:27 PM, Deshpande, Vivek R wrote:
> HI Vladimir
>
> Sorry about that.
> Please check this webrev
> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02
> /
> I have updated it.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, April 05, 2016 1:47 PM
> To: Deshpande, Vivek R; Rukmannagari, Shravya
> Cc: hotspot compiler
> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>
> I again can't apply changes because of CR at the end of lines in patch file.
>
> Vladimir
>
> On 4/5/16 1:41 PM, Deshpande, Vivek R wrote:
>>
>> Hi Vladimir
>>
>> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed.
>> Thank you for the review.
>>
>> Regards,
>> Vivek
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, April 05, 2016 1:34 PM
>> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya
>> Cc: hotspot compiler
>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them?
>>
>> I will start pre-integration testing.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote:
>>> Hi Christian
>>>
>>> We have updated the patch as per the suggested changes.
>>>
>>> The webrev for the same is at this location for your review.
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.
>>> 0
>>> 1
>>> /
>>>
>>> We will soon send another patch for CompilerDirectives changes.
>>>
>>> Regards,
>>>
>>> Vivek
>>>
>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>> *Sent:* Tuesday, March 29, 2016 11:29 AM
>>> *To:* Rukmannagari, Shravya
>>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
>>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>
>>>       On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>>>       <shravya.rukmannagari at intel.com
>>>       <mailto:shravya.rukmannagari at intel.com>> wrote:
>>>
>>>       Hi Christian,
>>>
>>>       We would add separate files for each intrinsic. By splitting the
>>>       CompilerDirectives, do you mean we have to add a separate file.
>>>       Sorry I didn?t exactly get it.
>>>
>>> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for 
>>> the CompilerDirectives changes and integrate them separately.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Shravya Rukmannagari.
>>>
>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R 
>>> <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov 
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
>>> Rukmannagari, Shravya <shravya.rukmannagari at intel.com 
>>> <mailto:shravya.rukmannagari at intel.com>>
>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>
>>> I left this comment in the bug:
>>>
>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files 
>>> we should put every intrinsic in its own file, like we did for 
>>> macroAssembler_x86_sha.cpp. They are already too big:
>>>
>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>>>        4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>>>        3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>>>
>>> Also, can we split out the CompilerDirectives changes?
>>>
>>>
>>>
>>>
>>>       On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>>>       <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>>       wrote:
>>>
>>>       Hi all
>>>
>>>       We would like to contribute a patch which optimizestan and log10
>>>       X86architecture usingIntel LIBM library.
>>>
>>>       Could you please review and sponsor this patch.
>>>
>>>       Bug-id:
>>>
>>>       https://bugs.openjdk.java.net/browse/JDK-8152907
>>>       webrev:
>>>
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.
>>> 0
>>> 0
>>> /
>>>
>>>       Thanks and regards,
>>>
>>>       Vivek
>>>

From christian.thalinger at oracle.com  Wed Apr  6 17:14:56 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 6 Apr 2016 07:14:56 -1000
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <57046A84.6040707@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com>
	<57046A84.6040707@oracle.com>
Message-ID: <AA39B09F-952B-49E6-A4B3-375DDEC17989@oracle.com>


> On Apr 5, 2016, at 3:46 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Multiple files are not always good. May be in a future we can rewrite this code to use shared parts (code or data). I think current split is enough for these changes.

Alright.

> 
> Thanks,
> Vladimir
> 
> On 4/5/16 6:37 PM, Christian Thalinger wrote:
>> 
>>> On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R
>>> <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>> wrote:
>>> 
>>> Hi Christian
>>> We have updated the patch as per the suggested changes.
>>> The webrev for the same is at this location for your review.
>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/
>> 
>> There are:
>> 
>>   73 #ifdef _LP64
>> 
>>  368 #else
>> 
>>  655 #endif
>> 
>> in the new files but I don?t see them share any code.  Maybe it would be
>> better to have dedicated x86_32 and x86_64 files.  Then the ifdefs are
>> not required.
>> 
>>> We will soon send another patch for CompilerDirectives changes.
>>> Regards,
>>> Vivek
>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>> *Sent:*Tuesday, March 29, 2016 11:29 AM
>>> *To:*Rukmannagari, Shravya
>>> *Cc:*Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>> 
>>>    On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>>>    <shravya.rukmannagari at intel.com
>>>    <mailto:shravya.rukmannagari at intel.com>> wrote:
>>>    Hi Christian,
>>>    We would add separate files for each intrinsic. By splitting the
>>>    CompilerDirectives, do you mean we have to add a separate file.
>>>    Sorry I didn?t exactly get it.
>>> 
>>> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for
>>> the CompilerDirectives changes and integrate them separately.
>>> 
>>> 
>>> Thanks,
>>> Shravya Rukmannagari.
>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>> *Sent:*Monday, March 28, 2016 5:18 PM
>>> *To:*Deshpande, Vivek R <vivek.r.deshpande at intel.com
>>> <mailto:vivek.r.deshpande at intel.com>>
>>> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
>>> Rukmannagari, Shravya <shravya.rukmannagari at intel.com
>>> <mailto:shravya.rukmannagari at intel.com>>
>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>> I left this comment in the bug:
>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we
>>> should put every intrinsic in its own file, like we did for
>>> macroAssembler_x86_sha.cpp. They are already too big:
>>> 
>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>>>    4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>>>    3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>>> Also, can we split out the CompilerDirectives changes?
>>> 
>>> 
>>> 
>>>    On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>>>    <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>>    wrote:
>>>    Hi all
>>>    We would like to contribute a patch which optimizestan and log10
>>>    X86architecture usingIntel LIBM library.
>>>    Could you please review and sponsor this patch.
>>>    Bug-id:
>>>    https://bugs.openjdk.java.net/browse/JDK-8152907
>>>    webrev:
>>>    http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/
>>>    Thanks and regards,
>>>    Vivek
>> 


From vladimir.kozlov at oracle.com  Wed Apr  6 17:17:47 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 6 Apr 2016 10:17:47 -0700
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<5704211E.5090007@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com>
	<57042460.5070306@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com>
	<5704312C.9000605@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com>
Message-ID: <570544BB.6040509@oracle.com>

I added #include myself and PIT testing passed.
I need to know who is author or contributor.

Thanks,
Vladimir

On 4/6/16 10:14 AM, Deshpande, Vivek R wrote:
> Hi Vladimir
>
> Please let me know, if I need to provide an updated patch with this change.
> Thanks for all your help.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, April 05, 2016 2:42 PM
> To: Deshpande, Vivek R; Rukmannagari, Shravya
> Cc: hotspot compiler
> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>
> Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp:
>
> hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56:
> error: use of undeclared identifier 'SharedRuntime'
> __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)));
>
> Note templateInterpreterGenerator_x86_32.cpp has that #include.
>
> It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used.
>
> Vladimir
>
>
>
> On 4/5/16 2:27 PM, Deshpande, Vivek R wrote:
>> HI Vladimir
>>
>> Sorry about that.
>> Please check this webrev
>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02
>> /
>> I have updated it.
>>
>> Regards,
>> Vivek
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, April 05, 2016 1:47 PM
>> To: Deshpande, Vivek R; Rukmannagari, Shravya
>> Cc: hotspot compiler
>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>> I again can't apply changes because of CR at the end of lines in patch file.
>>
>> Vladimir
>>
>> On 4/5/16 1:41 PM, Deshpande, Vivek R wrote:
>>>
>>> Hi Vladimir
>>>
>>> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed.
>>> Thank you for the review.
>>>
>>> Regards,
>>> Vivek
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Tuesday, April 05, 2016 1:34 PM
>>> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya
>>> Cc: hotspot compiler
>>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>
>>> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them?
>>>
>>> I will start pre-integration testing.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote:
>>>> Hi Christian
>>>>
>>>> We have updated the patch as per the suggested changes.
>>>>
>>>> The webrev for the same is at this location for your review.
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.
>>>> 0
>>>> 1
>>>> /
>>>>
>>>> We will soon send another patch for CompilerDirectives changes.
>>>>
>>>> Regards,
>>>>
>>>> Vivek
>>>>
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:* Tuesday, March 29, 2016 11:29 AM
>>>> *To:* Rukmannagari, Shravya
>>>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
>>>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>>
>>>>        On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>>>>        <shravya.rukmannagari at intel.com
>>>>        <mailto:shravya.rukmannagari at intel.com>> wrote:
>>>>
>>>>        Hi Christian,
>>>>
>>>>        We would add separate files for each intrinsic. By splitting the
>>>>        CompilerDirectives, do you mean we have to add a separate file.
>>>>        Sorry I didn?t exactly get it.
>>>>
>>>> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement for
>>>> the CompilerDirectives changes and integrate them separately.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Shravya Rukmannagari.
>>>>
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R
>>>> <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>>> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
>>>> Rukmannagari, Shravya <shravya.rukmannagari at intel.com
>>>> <mailto:shravya.rukmannagari at intel.com>>
>>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>>
>>>> I left this comment in the bug:
>>>>
>>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files
>>>> we should put every intrinsic in its own file, like we did for
>>>> macroAssembler_x86_sha.cpp. They are already too big:
>>>>
>>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>>>>         4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>>>>         3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>>>>
>>>> Also, can we split out the CompilerDirectives changes?
>>>>
>>>>
>>>>
>>>>
>>>>        On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>>>>        <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>>>        wrote:
>>>>
>>>>        Hi all
>>>>
>>>>        We would like to contribute a patch which optimizestan and log10
>>>>        X86architecture usingIntel LIBM library.
>>>>
>>>>        Could you please review and sponsor this patch.
>>>>
>>>>        Bug-id:
>>>>
>>>>        https://bugs.openjdk.java.net/browse/JDK-8152907
>>>>        webrev:
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.
>>>> 0
>>>> 0
>>>> /
>>>>
>>>>        Thanks and regards,
>>>>
>>>>        Vivek
>>>>

From vivek.r.deshpande at intel.com  Wed Apr  6 17:24:01 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Wed, 6 Apr 2016 17:24:01 +0000
Subject: RFR (M): 8152907: Update for tan and log10 for x86
In-Reply-To: <570544BB.6040509@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com>
	<98666E26-763E-40E9-838B-B612D4BAF468@oracle.com>
	<8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com>
	<97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com>
	<5704211E.5090007@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com>
	<57042460.5070306@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com>
	<5704312C.9000605@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com>
	<570544BB.6040509@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB87@ORSMSX106.amr.corp.intel.com>

Thanks Vladimir.

Code contributed by: 
Shravya Rukmannagari (shravya.rukmannagari at intel.com) and Vivek Deshpande (vivek.r.deshpande at intel.com)

Regards,
Vivek


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, April 06, 2016 10:18 AM
To: Deshpande, Vivek R; Rukmannagari, Shravya
Cc: hotspot compiler
Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86

I added #include myself and PIT testing passed.
I need to know who is author or contributor.

Thanks,
Vladimir

On 4/6/16 10:14 AM, Deshpande, Vivek R wrote:
> Hi Vladimir
>
> Please let me know, if I need to provide an updated patch with this change.
> Thanks for all your help.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, April 05, 2016 2:42 PM
> To: Deshpande, Vivek R; Rukmannagari, Shravya
> Cc: hotspot compiler
> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>
> Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp:
>
> hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56:
> error: use of undeclared identifier 'SharedRuntime'
> __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, 
> SharedRuntime::dexp)));
>
> Note templateInterpreterGenerator_x86_32.cpp has that #include.
>
> It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used.
>
> Vladimir
>
>
>
> On 4/5/16 2:27 PM, Deshpande, Vivek R wrote:
>> HI Vladimir
>>
>> Sorry about that.
>> Please check this webrev
>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0
>> 2
>> /
>> I have updated it.
>>
>> Regards,
>> Vivek
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, April 05, 2016 1:47 PM
>> To: Deshpande, Vivek R; Rukmannagari, Shravya
>> Cc: hotspot compiler
>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>>
>> I again can't apply changes because of CR at the end of lines in patch file.
>>
>> Vladimir
>>
>> On 4/5/16 1:41 PM, Deshpande, Vivek R wrote:
>>>
>>> Hi Vladimir
>>>
>>> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed.
>>> Thank you for the review.
>>>
>>> Regards,
>>> Vivek
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Tuesday, April 05, 2016 1:34 PM
>>> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya
>>> Cc: hotspot compiler
>>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>
>>> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them?
>>>
>>> I will start pre-integration testing.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote:
>>>> Hi Christian
>>>>
>>>> We have updated the patch as per the suggested changes.
>>>>
>>>> The webrev for the same is at this location for your review.
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.
>>>> 0
>>>> 1
>>>> /
>>>>
>>>> We will soon send another patch for CompilerDirectives changes.
>>>>
>>>> Regards,
>>>>
>>>> Vivek
>>>>
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:* Tuesday, March 29, 2016 11:29 AM
>>>> *To:* Rukmannagari, Shravya
>>>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler
>>>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>>
>>>>        On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya
>>>>        <shravya.rukmannagari at intel.com
>>>>        <mailto:shravya.rukmannagari at intel.com>> wrote:
>>>>
>>>>        Hi Christian,
>>>>
>>>>        We would add separate files for each intrinsic. By splitting the
>>>>        CompilerDirectives, do you mean we have to add a separate file.
>>>>        Sorry I didn?t exactly get it.
>>>>
>>>> Oh, sorry, I wasn?t clear enough.  Please file a new enhancement 
>>>> for the CompilerDirectives changes and integrate them separately.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Shravya Rukmannagari.
>>>>
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R 
>>>> <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>>> *Cc:*hotspot compiler <hotspot-compiler-dev at openjdk.java.net
>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov 
>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>;
>>>> Rukmannagari, Shravya <shravya.rukmannagari at intel.com 
>>>> <mailto:shravya.rukmannagari at intel.com>>
>>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86
>>>>
>>>> I left this comment in the bug:
>>>>
>>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files 
>>>> we should put every intrinsic in its own file, like we did for 
>>>> macroAssembler_x86_sha.cpp. They are already too big:
>>>>
>>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp
>>>>         4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp
>>>>         3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp
>>>>
>>>> Also, can we split out the CompilerDirectives changes?
>>>>
>>>>
>>>>
>>>>
>>>>        On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R
>>>>        <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
>>>>        wrote:
>>>>
>>>>        Hi all
>>>>
>>>>        We would like to contribute a patch which optimizestan and log10
>>>>        X86architecture usingIntel LIBM library.
>>>>
>>>>        Could you please review and sponsor this patch.
>>>>
>>>>        Bug-id:
>>>>
>>>>        https://bugs.openjdk.java.net/browse/JDK-8152907
>>>>        webrev:
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.
>>>> 0
>>>> 0
>>>> /
>>>>
>>>>        Thanks and regards,
>>>>
>>>>        Vivek
>>>>

From igor.veresov at oracle.com  Wed Apr  6 17:49:31 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 6 Apr 2016 10:49:31 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<E1FCFC2E-0D1F-4191-9F8B-DA1BF8817F4A@oracle.com>
	<1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com>
	<4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com>
	<9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com>
Message-ID: <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com>

Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately.

Thanks,
igor

> On Apr 5, 2016, at 4:22 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> 
>> On Apr 5, 2016, at 3:33 PM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
>> 
>> Igor,
>> 
>> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter
>> for instance?
> 
> Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed.
> 
>> 
>> If so, I am ok with checking this in - further notes below.
>> 
>>> On Apr 5, 2016, at 3:43 PM, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>>> 
>>> 
>>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear <karen.kinnear at oracle.com <mailto:karen.kinnear at oracle.com>> wrote:
>>>> 
>>>> I am in agreement with Lois that the JVMS looks good with moving the exception.
>>> 
>>> Thanks!
>>>> 
>>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next
>>>> meeting I will check one more time. It might be worth adding a comment.
>>> 
>>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/>
>>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle.
>>> 
>>>> 
>>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks
>>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null.
>>>> 
>>> 
>>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 <https://bugs.openjdk.java.net/browse/JDK-8152903>
>>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp).
>> 
>> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match
>> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup.
>> That is ok with me - I will add a note to the bug.
> 
> Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa?
> 
>> 
>> Also: I see a ciMethod::check_call that has a comment - 
>> IT appears to fail when applied to an invoke interface call site.
>> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. 
>> 
> 
> This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas?
> 
> igor
> 
>> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take
>> the subtleties of invoke interface and invoke special into account. 
>>> 
>>> igor
>>> 
>>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the
>>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code,
>>>> so that you get the correct behavior depending on the requesting byte code.
>>>> 
>>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so
>>>> I could use help studying this a bit more to understand if this really is resolution or is really selection.
>>>> 
>>>> thanks,
>>>> Karen
>>>> 
>>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>> 
>>>>> 
>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>>>> Hi Lois,
>>>>>> 
>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>>>> 
>>>>>> igor
>>>>> Hi Igor,
>>>>> 
>>>>> Thanks for waiting on this.  A couple of comments:
>>>>> 
>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>>>>> 
>>>>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>>>>> 
>>>>> Just curious did you also run the testbase default methods tests?
>>>>> Lois
>>>>> 
>>>>>> 
>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>> 
>>>>>>> Hi Igor,
>>>>>>> 
>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Lois
>>>>>>> 
>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>>>>> 
>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 <https://bugs.openjdk.java.net/browse/JDK-8153115>
>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/>
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> igor
> 


From tobias.hartmann at oracle.com  Thu Apr  7 06:13:06 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 7 Apr 2016 08:13:06 +0200
Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of
	<clinit>
In-Reply-To: <57053E5A.6050705@oracle.com>
References: <5704EA9A.7020202@oracle.com> <57053E5A.6050705@oracle.com>
Message-ID: <5705FA72.4030304@oracle.com>

Thanks, Vladimir.

Best regards,
Tobias

On 06.04.2016 18:50, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 4/6/16 3:53 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153514
>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/
>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/
>>
>> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature.
>>
>> I did not add tests for the new feature because that would require implementing additional methods to check for a successful <clinit> compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing.
>>
>> Thanks,
>> Tobias
>>

From rahul.v.raghavan at oracle.com  Thu Apr  7 06:43:08 2016
From: rahul.v.raghavan at oracle.com (Rahul Raghavan)
Date: Wed, 6 Apr 2016 23:43:08 -0700 (PDT)
Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in
	regmask.cpp
In-Reply-To: <C568518E7B433348B114B6A7122D474756E5038D@FMSMSX102.amr.corp.intel.com>
References: <f9892734-a3d1-4bce-b9c5-f7df762ba922@default>
	<56FC2A4B.5030905@oracle.com>
	<b8802940-b4cd-4c22-b9cd-a2045c5cafc5@default>
	<d9e683a3-662d-4da9-a9fc-2f7f667f0687@default>
	<C568518E7B433348B114B6A7122D474756E4F914@FMSMSX102.amr.corp.intel.com>
	<56FD74F2.2080102@oracle.com>
	<02d43f0c-f8a1-4f8c-9242-c110d78c314f@default>
	<C568518E7B433348B114B6A7122D474756E5038D@FMSMSX102.amr.corp.intel.com>
Message-ID: <e9d87222-457a-434f-b905-e8520bbe8589@default>

> -----Original Message-----
> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: Tuesday, April 05, 2016 1:35 AM
> 
> FYI
> 
> -----Original Message-----
> From: Berg, Michael C
> Sent: Monday, April 04, 2016 12:42 PM
> To: 'Rahul Raghavan' <rahul.v.raghavan at oracle.com>
> Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp
> 
> Looks ok Rahul.

Thank you Michael.

> 
> Thanks,
> Michael
> 
> -----Original Message-----
> From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com]
> Sent: Monday, April 04, 2016 1:09 AM
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: Dean Long <dean.long at oracle.com>; Berg, Michael C <michael.c.berg at intel.com>; Tobias Hartmann
> <tobias.hartmann at oracle.com>; Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp
> 
> Hi,
> 
> Please review the revised fix for JDK- 8149488.
> 
> <webrev.02>: http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/
> 
> Based on further checking and thanks to clarifications from Michael, it was verified that 8149488 issue can be fixed by just correcting
> the bitsInByte size to 256 in 'regmask.cpp', (and that earlier mentioned case of extending bitsInByte table size to 512, is not required).
> 
> Points from Michael for the record - "
>    > I believe Dean is right, I have debugged this and analyzed the usage model,
>    > we never made use of the upper components
>    > and register allocation has been right for VecZ for a good deal of time.
>    >
>    > All we need for a change is,
>    > Regmask.cpp:
>    >
>    > uint RegMask::Size() const {
>    >    extern uint8_t bitsInByte[256];
>    >
>    > A one line change.
>    >
>    > -Michael.
>    >
>    > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change.
>    > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX
>    > where we make use of VecZ and the upper bank of registers."
> 
> So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256.
> 
> Confirmed no issues with 'JPRT -testset hotspot' run.
> 
> Thanks,
> Rahul
> 
> > -----Original Message-----
> > From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM
> >
> > Michael, isn't the correct size for this table 256?  I missed how VecZ
> > relates to the table size.
> >
> > dl
> >
> > On 3/31/2016 9:58 AM, Berg, Michael C wrote:
> > > Up until now we have gotten along with the size constraint only.
> > > Let us have both the size and the table though for completeness.
> > > I think we can leave the name though.
> > >
> > > -Michael
> > >
> > > -----Original Message-----
> > > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com]
> > > Sent: Thursday, March 31, 2016 9:18 AM
> > > To: Dean Long <dean.long at oracle.com>;
> > > hotspot-compiler-dev at openjdk.java.net; Berg, Michael C
> > > <michael.c.berg at intel.com>
> > > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in
> > > regmask.cpp
> > >
> > > Hi Michael,
> > >
> > > With respect to below thread, request help with some questions.
> > > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet
> Size.
> > > Also comment got was for requirement to extend bitsInByte table to
> > > 512 size, for consistent mapping for VecZ register also, on
> > targets that support it.
> > > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here.
> > > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future?
> > >
> > > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay?
> > > (without extending current bitsInByte array contents) (Anyhow at
> > > present values above 0xFF is never indexed for bitsInByte in
> > RegMask::Size())
> > >
> > >           ----- src/share/vm/libadt/vectset.hpp
> > >           +#define BITS_IN_BYTE_ARRAY_SIZE 256
> > >           +
> > >
> > >           ----- src/share/vm/opto/regmask.cpp
> > >           -  extern uint8_t bitsInByte[512];
> > >           +  extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE];
> > >
> > >           ----- src/share/vm/libadt/vectset.cpp
> > >           -uint8_t bitsInByte[256] = {
> > >           +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = {
> > >
> > > I can send revised webrev for above if all okay. Please tell me if I am missing something.
> > >
> > >
> > > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ?
> > > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]')
> > >
> > > Thanks,
> > > Rahul
> > >
> > >> -----Original Message-----
> > >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM
> > >>
> > >>> -----Original Message-----
> > >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM
> > >>>
> > >>> When do we access elements 256 .. 511?  Wouldn't that mean we have
> > >>> 9-bit bytes?
> > >> Got your point Dean, Thanks.
> > >> I too got some questions here now; will check and reply soon.
> > >>
> > >> -Rahul
> > >>
> > >>> dl
> > >>>
> > >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote:
> > >>>> Hi,
> > >>>>
> > >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488.
> > >>>>
> > >>>> <bug>: https://bugs.openjdk.java.net/browse/JDK-8149488
> > >>>> <webrev.01>:
> > >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/
> > >>>>
> > >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512.
> > >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512.
> > >>>> Confirmed no issues with 'JPRT -testset hotspot' run.
> > >>>>
> > >>>> Thanks,
> > >>>> Rahul
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent:
> > >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler-
> > >>> dev at openjdk.java.net
> > >>>>> Should we not extend:
> > >>>>>
> > >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp:
> > >>>>>    uint8_t bitsInByte[256] = { // ...
> > >>>>>
> > >>>>> to 512
> > >>>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov'
> > >>>>>
> > >>>>> So how do we intend to map a VecZ register without 512 bits?
> > >>>>>
> > >>>>> -Michael
> > >>>>>
> > >>>>> -----Original Message-----
> > >>>>> From: hotspot-compiler-dev
> > >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf
> > >>>>> Of Vladimir Ivanov
> > >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan;
> > >>>>> hotspot-compiler-dev at openjdk.java.net
> > >>>>>
> > >>>>> Rahul,
> > >>>>>
> > >>>>> Can we define a constant instead and use it in both places?
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Vladimir Ivanov
> > >>>>>
> > >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote:
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> Please review the patch for JDK- 8149488.
> > >>>>>>
> > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488
> > >>>>>> Webrev:
> > >>>>>> http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/
> > >>>>>>
> > >>>>>> Corrected the bitsInByte array size in declaration.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Rahul
> > >>>>>>
> >

From rahul.v.raghavan at oracle.com  Thu Apr  7 06:43:42 2016
From: rahul.v.raghavan at oracle.com (Rahul Raghavan)
Date: Wed, 6 Apr 2016 23:43:42 -0700 (PDT)
Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in
	regmask.cpp
In-Reply-To: <5702B3C5.8070507@oracle.com>
References: <f9892734-a3d1-4bce-b9c5-f7df762ba922@default>
	<56FC2A4B.5030905@oracle.com>
	<b8802940-b4cd-4c22-b9cd-a2045c5cafc5@default>
	<d9e683a3-662d-4da9-a9fc-2f7f667f0687@default>
	<C568518E7B433348B114B6A7122D474756E4F914@FMSMSX102.amr.corp.intel.com>
	<56FD74F2.2080102@oracle.com>
	<02d43f0c-f8a1-4f8c-9242-c110d78c314f@default>
	<5702B3C5.8070507@oracle.com>
Message-ID: <224945da-cb88-4653-af5d-2865ac42ee08@default>

> -----Original Message-----
> From: Dean Long > Sent: Tuesday, April 05, 2016 12:05 AM
> 
> Looks OK.

Thank you Dean.

> 
> dl
> 
> On 4/4/2016 1:09 AM, Rahul Raghavan wrote:
> > Hi,
> >
> > Please review the revised fix for JDK- 8149488.
> >
> > <webrev.02>: http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/
> >
> > Based on further checking and thanks to clarifications from Michael,
> > it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp',
> > (and that earlier mentioned case of extending bitsInByte table size to 512, is not required).
> >
> > Points from Michael for the record - "
> >     > I believe Dean is right, I have debugged this and analyzed the usage model,
> >     > we never made use of the upper components
> >     > and register allocation has been right for VecZ for a good deal of time.
> >     >
> >     > All we need for a change is,
> >     > Regmask.cpp:
> >     >
> >     > uint RegMask::Size() const {
> >     >    extern uint8_t bitsInByte[256];
> >     >
> >     > A one line change.
> >     >
> >     > -Michael.
> >     >
> >     > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change.
> >     > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX
> >     > where we make use of VecZ and the upper bank of registers."
> >
> > So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256.
> >
> > Confirmed no issues with 'JPRT -testset hotspot' run.
> >
> > Thanks,
> > Rahul
> >
> >> -----Original Message-----
> >> From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM
> >>
> >> Michael, isn't the correct size for this table 256?  I missed how VecZ
> >> relates to the table size.
> >>
> >> dl
> >>
> >> On 3/31/2016 9:58 AM, Berg, Michael C wrote:
> >>> Up until now we have gotten along with the size constraint only.
> >>> Let us have both the size and the table though for completeness.
> >>> I think we can leave the name though.
> >>>
> >>> -Michael
> >>>
> >>> -----Original Message-----
> >>> From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com]
> >>> Sent: Thursday, March 31, 2016 9:18 AM
> >>> To: Dean Long <dean.long at oracle.com>; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C <michael.c.berg at intel.com>
> >>> Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp
> >>>
> >>> Hi Michael,
> >>>
> >>> With respect to below thread, request help with some questions.
> >>> Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet
> Size.
> >>> Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on
> >> targets that support it.
> >>> But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here.
> >>> Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future?
> >>>
> >>> So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay?
> >>> (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in
> >> RegMask::Size())
> >>>            ----- src/share/vm/libadt/vectset.hpp
> >>>            +#define BITS_IN_BYTE_ARRAY_SIZE 256
> >>>            +
> >>>
> >>>            ----- src/share/vm/opto/regmask.cpp
> >>>            -  extern uint8_t bitsInByte[512];
> >>>            +  extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE];
> >>>
> >>>            ----- src/share/vm/libadt/vectset.cpp
> >>>            -uint8_t bitsInByte[256] = {
> >>>            +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = {
> >>>
> >>> I can send revised webrev for above if all okay. Please tell me if I am missing something.
> >>>
> >>>
> >>> OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ?
> >>> (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]')
> >>>
> >>> Thanks,
> >>> Rahul
> >>>
> >>>> -----Original Message-----
> >>>> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM
> >>>>>
> >>>>> When do we access elements 256 .. 511?  Wouldn't that mean we have
> >>>>> 9-bit bytes?
> >>>> Got your point Dean, Thanks.
> >>>> I too got some questions here now; will check and reply soon.
> >>>>
> >>>> -Rahul
> >>>>
> >>>>> dl
> >>>>>
> >>>>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> With respect to below email thread, request help to review revised webrev.01 for 8149488.
> >>>>>>
> >>>>>> <bug>: https://bugs.openjdk.java.net/browse/JDK-8149488
> >>>>>> <webrev.01>:
> >>>>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/
> >>>>>>
> >>>>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512.
> >>>>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512.
> >>>>>> Confirmed no issues with 'JPRT -testset hotspot' run.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Rahul
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent:
> >>>>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler-
> >>>>> dev at openjdk.java.net
> >>>>>>> Should we not extend:
> >>>>>>>
> >>>>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp:
> >>>>>>>     uint8_t bitsInByte[256] = { // ...
> >>>>>>>
> >>>>>>> to 512
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov'
> >>>>>>>
> >>>>>>> So how do we intend to map a VecZ register without 512 bits?
> >>>>>>>
> >>>>>>> -Michael
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: hotspot-compiler-dev
> >>>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf
> >>>>>>> Of Vladimir Ivanov
> >>>>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan;
> >>>>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>>>>
> >>>>>>> Rahul,
> >>>>>>>
> >>>>>>> Can we define a constant instead and use it in both places?
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Vladimir Ivanov
> >>>>>>>
> >>>>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Please review the patch for JDK- 8149488.
> >>>>>>>>
> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488
> >>>>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/
> >>>>>>>>
> >>>>>>>> Corrected the bitsInByte array size in declaration.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Rahul
> >>>>>>>>
> 

From aleksey.shipilev at oracle.com  Thu Apr  7 07:30:36 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 7 Apr 2016 10:30:36 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <56FE87C2.50002@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
Message-ID: <57060C9C.4000000@oracle.com>

On 04/01/2016 05:37 PM, Aleksey Shipilev wrote:
> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote:
>> I would like to solicit comments for C1 support for new
>> Unsafe.compareAndExchange intrinsics (we have support for them in C2).
>> The rest of new Unsafe methods that are not intrinsified by C1 are
>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be
>> emulated with existing APIs.
>>
>> Bug:
>>   https://bugs.openjdk.java.net/browse/JDK-8152753
>>
>> Webrev:
>>   http://cr.openjdk.java.net/~shade/8152753/webrev.00/
> 
> Update:
>   http://cr.openjdk.java.net/~shade/8152753/webrev.01/
> 
> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some
> other cleanups.
> 
> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT
> hs-comp testset (some unrelated timeouts on SPARC).

Anyone?

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/317bee2f/signature.asc>

From jamsheed.c.m at oracle.com  Thu Apr  7 08:41:34 2016
From: jamsheed.c.m at oracle.com (Jamsheed C m)
Date: Thu, 7 Apr 2016 14:11:34 +0530
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
Message-ID: <57061D3E.8050408@oracle.com>

Hi Martin,

one comment:

the count increment update should use release store + atomic update.
ref: 
https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming

Best Regards
Jamsheed

On 4/6/2016 6:54 PM, Doerr, Martin wrote:
>
> Hi Jamsheed and all,
>
> thanks for your explanation.
>
> About Case 1:
>
> I basically agree with that reading _next==NULL or _count==0 only 
> leads to false negatives and is not critical.
>
> Yes, we could live with a few more false negatives on weak memory 
> model platforms (even though this is not my preferred design).
>
> About Case 2:
>
> What I?m missing on the reader?s side of the _count field is something 
> which prevents processors from speculatively loading the contents of 
> the ExceptionCache.
>
> In ExceptionCache::test_address, the _count only affects the control flow.
>
> PPC and ARM processors can predict branches which depend on the _count 
> field and load speculatively from the pc and handler fields (which may 
> be stale data!).
>
> Due to out-of-order execution of the loads, it can actually happen, 
> that the new _count value is observed, but stale data is read from pc 
> and handler fields.
>
> I guess it is highly unlikely that we will ever observe this, but 
> there?s no guarantee.
>
> I think my concern about using non-volatile fields for the 
> _exception_cache is also still valid.
>
> Nothing prevents C++ Compilers from loading the pointer twice from 
> memory. They may expect to get the pointer to the same instance both 
> times but actually get two different ones.
>
> For example, this may lead to the situation that 
> handler_for_exception_and_pc uses one ExceptionCache instance for 
> calling the match function and another one (du to reload of 
> non-volatile field) for calling next().
>
> May other people would like to comment on this lengthy discussion as well?
>
> Best regards,
>
> Martin
>
> *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
> *Sent:* Mittwoch, 6. April 2016 13:54
> *To:* Doerr, Martin <martin.doerr at sap.com>; 
> hotspot-compiler-dev at openjdk.java.net
> *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not 
> multi-thread safe
>
> Hi Martin,
>
> On 4/6/2016 2:49 PM, Doerr, Martin wrote:
>
>     Hi Jamsheed,
>
>     here are the cases of add_handler_for_exception_and_pc we should
>     talk about:
>
>     Case 1: A new ExceptionCache instance needs to get added.
>
>     The storestore barrier you have added is used in the constructor
>     of the ExceptionCache and it releases the most critical fields of
>     it. I think this is what you explained in [1] in your email below.
>
>     The new values of _count and _next fields are written afterwards
>     and hence not covered by this release barrier. Readers of the
>     _exception_cache may read _count==0 or _next==NULL.
>
>     One could argue that this is not critical, but I guess this was
>     not intended?
>
>     At least the _exception_cache field needs to be volatile to
>     prevent optimizers from breaking anything. This is always needed
>     for fields which are accessed concurrently by multiple threads
>     without locks (as the readers do).
>
>     I think releasing the completely initialized ExceptionCache
>     instance is a much cleaner design.
>
> Having count < actual entries, or having _next = null is OK (as there 
> is always (locked)slow path to check again).
> Quoting comment from read path.
>
>       // We never grab a lock to read the exception cache, so we may
>       // have false negatives. This is okay, as it can only happen during
>       // the first few exception lookups for a given nmethod.
>
> Weak memory platforms may have a few more false negatives. but isn't 
> that OK ?
> This helps us, as we can remove volatile from picture, and actually 
> good for read paths.
>
>
>     Case 2: An existing ExceptionCache instance gets a new entry.
>
>     In this case your storestore barrier is good to release all
>     updated fields. However, we need to consider the readers, too. The
>     _count field needs to be volatile and the load must acquire.
>     Otherwise, stale data may get read by processors which perform
>     loads on speculative paths.
>
> storestore mem barrier handles this,  as count <= no of real entries. 
> and there is always locked slow path to check again.
> As said before, there may be a few more false negatives in weak memory 
> platforms than strong ones.
>
> Best Regards,
> Jamsheed
>
>
>     I have added the acquire barrier for the _count field here:
>
>     http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/
>     <http://cr.openjdk.java.net/%7Emdoerr/8153267_exception_cache/webrev.01/>
>
>     Does this answer your questions or is anything still unclear?
>
>     Best regards,
>
>     Martin
>
>     *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
>     *Sent:* Mittwoch, 6. April 2016 10:11
>     *To:* Doerr, Martin <martin.doerr at sap.com>
>     <mailto:martin.doerr at sap.com>;
>     hotspot-compiler-dev at openjdk.java.net
>     <mailto:hotspot-compiler-dev at openjdk.java.net>
>     *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not
>     multi-thread safe
>
>     Thanks for the reply. trying to understand stuffs.
>
>
>
>         void nmethod::add_handler_for_exception_and_pc(Handle
>         exception, address pc, address handler) {
>           // There are potential race conditions during exception
>         cache updates, so we
>           // must own the ExceptionCache_lock before doing ANY
>         modifications. Because
>           // we don't lock during reads, it is possible to have
>         several threads attempt
>           // to update the cache with the same data. We need to check
>         for already inserted
>           // copies of the current data before adding it.
>
>           MutexLocker ml(ExceptionCache_lock);
>           ExceptionCache* target_entry =
>         exception_cache_entry_for_exception(exception);
>
>           if (target_entry == NULL ||
>         !target_entry->add_address_and_handler(pc,handler)) {
>             target_entry = new ExceptionCache(exception,pc,handler);
>             add_exception_cache_entry(target_entry);
>           }
>         }
>
>
>     [1]there is a storestore mem barrier before count is updated in 
>     add_address_and_handler
>     this ensure exception pc and handler address are updated before
>     count is incremented  and  Exception cache entry is updated at (
>     nm->_exception_cache or in the list ec->_next ).
>
>
>
>         address nmethod::handler_for_exception_and_pc(Handle
>         exception, address pc) {
>           // We never grab a lock to read the exception cache, so we may
>           // have false negatives. This is okay, as it can only happen
>         during
>           // the first few exception lookups for a given nmethod.
>           ExceptionCache* ec = exception_cache();
>           while (ec != NULL) {
>             address ret_val;
>             if ((ret_val = ec->match(exception,pc)) != NULL) {
>               return ret_val;
>             }
>             ec = ec->next();
>           }
>           return NULL;
>         }
>
>
>     and in read logic. we first check ec entry is available (non null
>     check) before proceeding further.
>     if ec is non null and ec_type,excpetion pc, and handler are
>     available by[1]. though count can be reordered and not updated
>     with new value.
>
>     this fixes the issue. why you think it doesn't?
>
>     Best Regards,
>     Jamsheed
>
>
>     On 4/5/2016 3:40 PM, Doerr, Martin wrote:
>
>         Hi Jamsheed,
>
>         thanks for pointing me to it. Interesting that you have found
>         such a problem so shortly before me J
>
>         My webrev addresses some aspects which are not covered by your
>         fix:
>
>         -add_handler_for_exception_and_pc adds a new ExceptionCache
>         instance in the other case. They need to get released as well.
>
>         -The readers of the _exception_cache field are not safe, yet.
>         As Andrew Haley pointed out, optimizers may modify load
>         accesses for non-volatile fields.
>
>         So I think my change is still needed.
>
>         And after taking a closer look at your change, I think the
>         _count field which is addressed by your fix needs to be
>         volatile as well. I can incorporate that in my change if you like.
>
>         Would you agree?
>
>         Best regards,
>
>         Martin
>
>         *From:*hotspot-compiler-dev
>         [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On
>         Behalf Of *Jamsheed C m
>         *Sent:* Montag, 4. April 2016 08:14
>         *To:* hotspot-compiler-dev at openjdk.java.net
>         <mailto:hotspot-compiler-dev at openjdk.java.net>
>         *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not
>         multi-thread safe
>
>         Hi Martin,
>
>         "nmethod's exception cache not multi-thread safe"  bug is
>         fixed in b107
>         bug id: https://bugs.openjdk.java.net/browse/JDK-8143897
>         fix changeset:
>         http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9
>         discussion link:
>         http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html
>
>         Best Regards,
>         Jamsheed
>
>         On 4/1/2016 6:07 PM, Doerr, Martin wrote:
>
>             Hello everyone,
>
>             we have found a concurrency problem with the nmethod?s
>             exception cache. Readers of the cache may read stale data
>             on weak memory platforms.
>
>             The writers of the cache are synchronized by locks, but
>             there may be concurrent readers: The compiler runtimes use
>             nmethod::handler_for_exception_and_pc to access the cache
>             without locking.
>
>             Therefore, the nmethod's field _exception_cache needs to
>             be volatile and adding new entries must be done by
>             releasing stores. (Loading seems to be fine without
>             acquire because there's an address dependency from the
>             load of the cache to the usage of its contents which is
>             sufficient to ensure ordering on all openjdk platforms.)
>
>             I also added a minor cleanup: I changed nmethod::is_alive
>             to read the volatile field _state only once. It is
>             certainly undesired to force the compiler to load it from
>             memory twice.
>
>             Webrev is here:
>
>             http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/
>             <http://cr.openjdk.java.net/%7Emdoerr/8153267_exception_cache/webrev.00/>
>
>             Please review. I will also need a sponsor.
>
>             Best regards,
>
>             Martin
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/846ec5f5/attachment-0001.html>

From aph at redhat.com  Thu Apr  7 08:51:34 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 7 Apr 2016 09:51:34 +0100
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57060C9C.4000000@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com>
Message-ID: <57061F96.6080001@redhat.com>

I'm very tempted to review this, but there is a rather odd thing:
the bug does not explain the motivation for this change.

Andrew.


From aleksey.shipilev at oracle.com  Thu Apr  7 08:55:32 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 7 Apr 2016 11:55:32 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57061F96.6080001@redhat.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com>
Message-ID: <57062084.4020209@oracle.com>

On 04/07/2016 11:51 AM, Andrew Haley wrote:
> I'm very tempted to review this, but there is a rather odd thing:
> the bug does not explain the motivation for this change.

Not following you.

What motivation do you need apart from "...the rest of new Unsafe
methods that are not intrinsified by C1 are handled by Java fallbacks in
Unsafe.java. compareAndExchange cannot be emulated with existing APIs"?

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/3bd71086/signature.asc>

From martin.doerr at sap.com  Thu Apr  7 09:08:08 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 7 Apr 2016 09:08:08 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <57061D3E.8050408@oracle.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
	<57061D3E.8050408@oracle.com>
Message-ID: <f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>

Hi Jamsheed,

atomic update for the _count would only be required if there were multiply threads which attempt to increment it concurrently. However, updates are under lock, so we only have concurrent readers which is ok.

I still think "volatile" does what we need here. Especially the xlC compiler on AIX tends to reload variables from memory. Exactly this can be prevented by making the field volatile.

People who don't like volatile should come up with a different solution, please.

Best regards,
Martin


From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
Sent: Donnerstag, 7. April 2016 10:42
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Hi Martin,

one comment:

the count increment update should use release store + atomic update.
ref: https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming

Best Regards
Jamsheed
On 4/6/2016 6:54 PM, Doerr, Martin wrote:
Hi Jamsheed and all,

thanks for your explanation.

About Case 1:
I basically agree with that reading _next==NULL or _count==0 only leads to false negatives and is not critical.
Yes, we could live with a few more false negatives on weak memory model platforms (even though this is not my preferred design).

About Case 2:
What I'm missing on the reader's side of the _count field is something which prevents processors from speculatively loading the contents of the ExceptionCache.

In ExceptionCache::test_address, the _count only affects the control flow.
PPC and ARM processors can predict branches which depend on the _count field and load speculatively from the pc and handler fields (which may be stale data!).
Due to out-of-order execution of the loads, it can actually happen, that the new _count value is observed, but stale data is read from pc and handler fields.
I guess it is highly unlikely that we will ever observe this, but there's no guarantee.

I think my concern about using non-volatile fields for the _exception_cache is also still valid.
Nothing prevents C++ Compilers from loading the pointer twice from memory. They may expect to get the pointer to the same instance both times but actually get two different ones.
For example, this may lead to the situation that handler_for_exception_and_pc uses one ExceptionCache instance for calling the match function and another one (du to reload of non-volatile field) for calling next().

May other people would like to comment on this lengthy discussion as well?

Best regards,
Martin


From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
Sent: Mittwoch, 6. April 2016 13:54
To: Doerr, Martin <martin.doerr at sap.com><mailto:martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Hi Martin,

On 4/6/2016 2:49 PM, Doerr, Martin wrote:


Hi Jamsheed,

here are the cases of add_handler_for_exception_and_pc we should talk about:

Case 1: A new ExceptionCache instance needs to get added.
The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below.
The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL.
One could argue that this is not critical, but I guess this was not intended?

At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do).
I think releasing the completely initialized ExceptionCache instance is a much cleaner design.
Having count < actual entries, or having _next = null is OK (as there is always (locked)slow path to check again).
Quoting comment from read path.


  // We never grab a lock to read the exception cache, so we may
  // have false negatives. This is okay, as it can only happen during
  // the first few exception lookups for a given nmethod.
Weak memory platforms may have a few more false negatives. but isn't that OK ?
This helps us, as we can remove volatile from picture, and actually good for read paths.


Case 2: An existing ExceptionCache instance gets a new entry.
In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths.
storestore mem barrier handles this,  as count <= no of real entries. and there is always locked slow path to check again.
As said before, there may be a few more false negatives in weak memory platforms than strong ones.

Best Regards,
Jamsheed


I have added the acquire barrier for the _count field here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/

Does this answer your questions or is anything still unclear?

Best regards,
Martin


From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com]
Sent: Mittwoch, 6. April 2016 10:11
To: Doerr, Martin <martin.doerr at sap.com><mailto:martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Thanks for the reply. trying to understand stuffs.


void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) {
  // There are potential race conditions during exception cache updates, so we
  // must own the ExceptionCache_lock before doing ANY modifications. Because
  // we don't lock during reads, it is possible to have several threads attempt
  // to update the cache with the same data. We need to check for already inserted
  // copies of the current data before adding it.

  MutexLocker ml(ExceptionCache_lock);
  ExceptionCache* target_entry = exception_cache_entry_for_exception(exception);

  if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) {
    target_entry = new ExceptionCache(exception,pc,handler);
    add_exception_cache_entry(target_entry);
  }
}

[1]there is a storestore mem barrier before count is updated in  add_address_and_handler
this ensure exception pc and handler address are updated before count is incremented  and  Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ).


address nmethod::handler_for_exception_and_pc(Handle exception, address pc) {
  // We never grab a lock to read the exception cache, so we may
  // have false negatives. This is okay, as it can only happen during
  // the first few exception lookups for a given nmethod.
  ExceptionCache* ec = exception_cache();
  while (ec != NULL) {
    address ret_val;
    if ((ret_val = ec->match(exception,pc)) != NULL) {
      return ret_val;
    }
    ec = ec->next();
  }
  return NULL;
}

and in read logic. we first check ec entry is available (non null check) before proceeding further.
if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value.

this fixes the issue. why you think it doesn't?

Best Regards,
Jamsheed


On 4/5/2016 3:40 PM, Doerr, Martin wrote:
Hi Jamsheed,

thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :)

My webrev addresses some aspects which are not covered by your fix:

-          add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well.

-          The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields.

So I think my change is still needed.

And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like.
Would you agree?

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m
Sent: Montag, 4. April 2016 08:14
To: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Hi Martin,

"nmethod's exception cache not multi-thread safe"  bug is fixed in b107
bug id: https://bugs.openjdk.java.net/browse/JDK-8143897
fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9
discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html

Best Regards,
Jamsheed
On 4/1/2016 6:07 PM, Doerr, Martin wrote:
Hello everyone,

we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms.

The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking.
Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.)

I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/<http://cr.openjdk.java.net/%7Emdoerr/8153267_exception_cache/webrev.00/>

Please review. I will also need a sponsor.

Best regards,
Martin


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/ec0a63d3/attachment-0001.html>

From aph at redhat.com  Thu Apr  7 09:33:23 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 7 Apr 2016 10:33:23 +0100
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57062084.4020209@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com>
	<57062084.4020209@oracle.com>
Message-ID: <57062963.5070403@redhat.com>

On 07/04/16 09:55, Aleksey Shipilev wrote:
> On 04/07/2016 11:51 AM, Andrew Haley wrote:
>> I'm very tempted to review this, but there is a rather odd thing:
>> the bug does not explain the motivation for this change.
> 
> Not following you.
> 
> What motivation do you need apart from "...the rest of new Unsafe
> methods that are not intrinsified by C1 are handled by Java fallbacks in
> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"?

As far as I know all of these are handled by native methods.
Is that not correct?  They seem to be.

Andrew.


From aleksey.shipilev at oracle.com  Thu Apr  7 09:48:48 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 7 Apr 2016 12:48:48 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57062963.5070403@redhat.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com>
	<57062084.4020209@oracle.com> <57062963.5070403@redhat.com>
Message-ID: <57062D00.5000609@oracle.com>

On 04/07/2016 12:33 PM, Andrew Haley wrote:
> On 07/04/16 09:55, Aleksey Shipilev wrote:
>> On 04/07/2016 11:51 AM, Andrew Haley wrote:
>>> I'm very tempted to review this, but there is a rather odd thing:
>>> the bug does not explain the motivation for this change.
>>
>> Not following you.
>>
>> What motivation do you need apart from "...the rest of new Unsafe
>> methods that are not intrinsified by C1 are handled by Java fallbacks in
>> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"?
> 
> As far as I know all of these are handled by native methods.
> Is that not correct?  They seem to be.

Yes. This is not a question on implementing CompareAndExchange: it is
handled by unsafe.cpp bits well, and this is why runtime can work even
without getting compilers in picture.

This is about getting CompareAndExchange support in C1 in sync with
CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native
calls, and should do the same for new CompareAndExchange.

In other words, out of the realm of VarHandles accessor methods, we are
either covered by Unsafe.java shortcuts, or C1/C2 intrinsics that
replace unsafe.cpp calls. Except for CompareAndExchange, until this RFE.

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/9ed9e0fb/signature.asc>

From aph at redhat.com  Thu Apr  7 10:14:23 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 7 Apr 2016 11:14:23 +0100
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
	<57061D3E.8050408@oracle.com>
	<f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>
Message-ID: <570632FF.7090103@redhat.com>

On 07/04/16 10:08, Doerr, Martin wrote:

> atomic update for the _count would only be required if there were
> multiply threads which attempt to increment it
> concurrently. However, updates are under lock, so we only have
> concurrent readers which is ok. 
> 
> I still think ?volatile? does what we need here. Especially the xlC
> compiler on AIX tends to reload variables from memory. Exactly this
> can be prevented by making the field volatile.

I think your latest patch is OK.  Whether volatile is really good
enough, I don't know.  The new(ish) C++ memory model treats this as a
race, and therefore undefined behaviour.  Old C++ didn't have a memory
model, so the best we can do with racy code is guess about what our
compilers might do.

I certainly much prefer a release_store to the storestore fence used
in the fix for 8143897.

Andrew.

From aph at redhat.com  Thu Apr  7 10:17:15 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 7 Apr 2016 11:17:15 +0100
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57062D00.5000609@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com>
	<57062084.4020209@oracle.com> <57062963.5070403@redhat.com>
	<57062D00.5000609@oracle.com>
Message-ID: <570633AB.6040704@redhat.com>

On 07/04/16 10:48, Aleksey Shipilev wrote:
> On 04/07/2016 12:33 PM, Andrew Haley wrote:
>> On 07/04/16 09:55, Aleksey Shipilev wrote:
>>> On 04/07/2016 11:51 AM, Andrew Haley wrote:
>>>> I'm very tempted to review this, but there is a rather odd thing:
>>>> the bug does not explain the motivation for this change.
>>>
>>> Not following you.
>>>
>>> What motivation do you need apart from "...the rest of new Unsafe
>>> methods that are not intrinsified by C1 are handled by Java fallbacks in
>>> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"?
>>
>> As far as I know all of these are handled by native methods.
>> Is that not correct?  They seem to be.
> 
> Yes. This is not a question on implementing CompareAndExchange: it is
> handled by unsafe.cpp bits well, and this is why runtime can work even
> without getting compilers in picture.

I thought so.

> This is about getting CompareAndExchange support in C1 in sync with
> CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native
> calls, and should do the same for new CompareAndExchange.

So, this is entirely about making C1-generated code more efficient,
by avoiding native calls.  Right?

Andrew.


From aleksey.shipilev at oracle.com  Thu Apr  7 10:26:28 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 7 Apr 2016 13:26:28 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <570633AB.6040704@redhat.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com>
	<57062084.4020209@oracle.com> <57062963.5070403@redhat.com>
	<57062D00.5000609@oracle.com> <570633AB.6040704@redhat.com>
Message-ID: <570635D4.8020308@oracle.com>

On 04/07/2016 01:17 PM, Andrew Haley wrote:
> On 07/04/16 10:48, Aleksey Shipilev wrote:
>> On 04/07/2016 12:33 PM, Andrew Haley wrote:
>>> On 07/04/16 09:55, Aleksey Shipilev wrote:
>>>> On 04/07/2016 11:51 AM, Andrew Haley wrote:
>>>>> I'm very tempted to review this, but there is a rather odd thing:
>>>>> the bug does not explain the motivation for this change.
>>>>
>>>> Not following you.
>>>>
>>>> What motivation do you need apart from "...the rest of new Unsafe
>>>> methods that are not intrinsified by C1 are handled by Java fallbacks in
>>>> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"?
>>>
>>> As far as I know all of these are handled by native methods.
>>> Is that not correct?  They seem to be.
>>
>> Yes. This is not a question on implementing CompareAndExchange: it is
>> handled by unsafe.cpp bits well, and this is why runtime can work even
>> without getting compilers in picture.
> 
> I thought so.
> 
>> This is about getting CompareAndExchange support in C1 in sync with
>> CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native
>> calls, and should do the same for new CompareAndExchange.
> 
> So, this is entirely about making C1-generated code more efficient,
> by avoiding native calls.  Right?

Yes. The same reason why CompareAndSwap and {get,put}* are intrinsified
by C1.

-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/6b321155/signature.asc>

From martin.doerr at sap.com  Thu Apr  7 10:51:41 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 7 Apr 2016 10:51:41 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <570632FF.7090103@redhat.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
	<57061D3E.8050408@oracle.com>
	<f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>
	<570632FF.7090103@redhat.com>
Message-ID: <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap>

Hi Andrew, Jamsheed and all,

thank you very much for your input.

As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count().
Therefore, I have replaced the storestore barrier introduced with JDK-8143897  (even though this barrier was also correct).

My change still contains a releasing store for newly created ExceptionCache instances.
As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms.
I think having the release doesn't hurt too much and makes the design a little cleaner.

I also added comments based on your input.

The new webrev is here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/

Please review. I will also need a sponsor from Oracle, please.

Thanks again and best regards,
Martin


-----Original Message-----
From: Andrew Haley [mailto:aph at redhat.com] 
Sent: Donnerstag, 7. April 2016 12:14
To: Doerr, Martin <martin.doerr at sap.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

On 07/04/16 10:08, Doerr, Martin wrote:

> atomic update for the _count would only be required if there were
> multiply threads which attempt to increment it
> concurrently. However, updates are under lock, so we only have
> concurrent readers which is ok. 
> 
> I still think "volatile" does what we need here. Especially the xlC
> compiler on AIX tends to reload variables from memory. Exactly this
> can be prevented by making the field volatile.

I think your latest patch is OK.  Whether volatile is really good
enough, I don't know.  The new(ish) C++ memory model treats this as a
race, and therefore undefined behaviour.  Old C++ didn't have a memory
model, so the best we can do with racy code is guess about what our
compilers might do.

I certainly much prefer a release_store to the storestore fence used
in the fix for 8143897.

Andrew.

From aph at redhat.com  Thu Apr  7 11:34:54 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 7 Apr 2016 12:34:54 +0100
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <570635D4.8020308@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com>
	<57062084.4020209@oracle.com> <57062963.5070403@redhat.com>
	<57062D00.5000609@oracle.com> <570633AB.6040704@redhat.com>
	<570635D4.8020308@oracle.com>
Message-ID: <570645DE.4070708@redhat.com>

On 04/07/2016 11:26 AM, Aleksey Shipilev wrote:
>> So, this is entirely about making C1-generated code more efficient,
>> > by avoiding native calls.  Right?
> Yes. The same reason why CompareAndSwap and {get,put}* are intrinsified
> by C1.

OK, I'll look at this today.

Andrew.


From pavel.punegov at oracle.com  Thu Apr  7 13:32:53 2016
From: pavel.punegov at oracle.com (Pavel Punegov)
Date: Thu, 7 Apr 2016 16:32:53 +0300
Subject: RFR(XS): 8140354: Unquarantine tests that failed with OutOfMemoryError
Message-ID: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com>

Hi,

please review this fix to unquarantine CompilerControl tests after the  JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2]
The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM.

[1] https://bugs.openjdk.java.net/browse/JDK-8140354 <https://bugs.openjdk.java.net/browse/JDK-8140354>
[2] https://bugs.openjdk.java.net/browse/JDK-8144621 <https://bugs.openjdk.java.net/browse/JDK-8144621>
--
webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/ <http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/>
bug https://bugs.openjdk.java.net/browse/JDK-8153661 <https://bugs.openjdk.java.net/browse/JDK-8153661>
? Pavel.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/7b742d13/attachment.html>

From igor.ignatyev at oracle.com  Thu Apr  7 13:43:01 2016
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 7 Apr 2016 16:43:01 +0300
Subject: RFR(XS): 8140354: Unquarantine tests that failed with
	OutOfMemoryError
In-Reply-To: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com>
References: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com>
Message-ID: <2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com>

Pavel,

looks good to me

? Igor
> On Apr 7, 2016, at 4:32 PM, Pavel Punegov <pavel.punegov at oracle.com> wrote:
> 
> Hi,
> 
> please review this fix to unquarantine CompilerControl tests after the  JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2]
> The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM.
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8140354
> [2] https://bugs.openjdk.java.net/browse/JDK-8144621
> --
> webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/
> bug https://bugs.openjdk.java.net/browse/JDK-8153661
> 
> ? Pavel.
> 


From aph at redhat.com  Thu Apr  7 14:29:31 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 7 Apr 2016 15:29:31 +0100
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57060C9C.4000000@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com>
Message-ID: <57066ECB.2010502@redhat.com>

On 04/07/2016 08:30 AM, Aleksey Shipilev wrote:
> On 04/01/2016 05:37 PM, Aleksey Shipilev wrote:
>> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote:
>>> I would like to solicit comments for C1 support for new
>>> Unsafe.compareAndExchange intrinsics (we have support for them in C2).
>>> The rest of new Unsafe methods that are not intrinsified by C1 are
>>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be
>>> emulated with existing APIs.
>>>
>>> Bug:
>>>   https://bugs.openjdk.java.net/browse/JDK-8152753
>>>
>>> Webrev:
>>>   http://cr.openjdk.java.net/~shade/8152753/webrev.00/
>>
>> Update:
>>   http://cr.openjdk.java.net/~shade/8152753/webrev.01/
>>
>> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some
>> other cleanups.
>>
>> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT
>> hs-comp testset (some unrelated timeouts on SPARC).
> 
> Anyone?

Reviewed, OK.

Andrew.


From felix.yang at linaro.org  Thu Apr  7 15:01:37 2016
From: felix.yang at linaro.org (Felix Yang)
Date: Thu, 7 Apr 2016 23:01:37 +0800
Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair
Message-ID: <CACc5Y6TJb-X1AM9sq9-iLyUWD1sPvsPhy2BbcxJf70rNSXfrFg@mail.gmail.com>

Hi,

    Please review webrev:
http://cr.openjdk.java.net/~fyang/8153713/webrev.00/
    JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713

    Currently, C2 compiler generate independent stores to clear short
arrays whose size is no bigger than parameter InitArrayShortSize (refer to
ClearArrayNode::Ideal function).  For the aarch64 port, we have store pair
instruction which can zero two memory words at a time and this will be good
for code size and maybe performance for some micro-archs.

    For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions are
generated with the patch, which mean about 2KB reduction in codesize.

    Tested with JTreg hotspot, langtools and jdk.  Is it OK?

Thanks,
Felix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/730f03fc/attachment.html>

From vladimir.x.ivanov at oracle.com  Thu Apr  7 15:22:39 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 7 Apr 2016 18:22:39 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <56FE87C2.50002@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
Message-ID: <57067B3F.4060403@oracle.com>

Aleksey,

Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not 
Compiler::is_intrinsic_supported?

vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you 
effectively completely disable the intrinsics on all non-x86 platforms.

I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and 
check it in Compiler::is_intrinsic_supported.

Best regards,
Vladimir Ivanov

On 4/1/16 5:37 PM, Aleksey Shipilev wrote:
> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote:
>> I would like to solicit comments for C1 support for new
>> Unsafe.compareAndExchange intrinsics (we have support for them in C2).
>> The rest of new Unsafe methods that are not intrinsified by C1 are
>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be
>> emulated with existing APIs.
>>
>> Bug:
>>    https://bugs.openjdk.java.net/browse/JDK-8152753
>>
>> Webrev:
>>    http://cr.openjdk.java.net/~shade/8152753/webrev.00/
>
> Update:
>    http://cr.openjdk.java.net/~shade/8152753/webrev.01/
>
> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some
> other cleanups.
>
> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT
> hs-comp testset (some unrelated timeouts on SPARC).
>
> Thanks,
> -Aleksey
>

From aleksey.shipilev at oracle.com  Thu Apr  7 15:34:08 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 7 Apr 2016 18:34:08 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57067B3F.4060403@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57067B3F.4060403@oracle.com>
Message-ID: <57067DF0.5090707@oracle.com>

On 04/07/2016 06:22 PM, Vladimir Ivanov wrote:
> Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not
> Compiler::is_intrinsic_supported?
>
> vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you
> effectively completely disable the intrinsics on all non-x86 platforms.
> 
> I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and
> check it in Compiler::is_intrinsic_supported.

I actually did that originally, see:
  http://cr.openjdk.java.net/~shade/8152753/webrev.00/

...but then moved that to vmIntrinsics::is_disabled_by_flags by John's
suggestion -- all flag sensing is done there. It is a matter of
approach, really, and I think current version aligns better with the
existing intrinsic flags.

Non-x86 platforms have not yet implemented CAE intrinsics, and this
forces their hand to implement both C1 and C2 parts before flipping the
platform-dependent flag. Which may or may not be a good thing, but I
don't have preference either way.

-Aleksey

> On 4/1/16 5:37 PM, Aleksey Shipilev wrote:
>> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote:
>>> I would like to solicit comments for C1 support for new
>>> Unsafe.compareAndExchange intrinsics (we have support for them in C2).
>>> The rest of new Unsafe methods that are not intrinsified by C1 are
>>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be
>>> emulated with existing APIs.
>>>
>>> Bug:
>>>    https://bugs.openjdk.java.net/browse/JDK-8152753
>>>
>>> Webrev:
>>>    http://cr.openjdk.java.net/~shade/8152753/webrev.00/
>>
>> Update:
>>    http://cr.openjdk.java.net/~shade/8152753/webrev.01/
>>
>> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some
>> other cleanups.
>>
>> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT
>> hs-comp testset (some unrelated timeouts on SPARC).
>>
>> Thanks,
>> -Aleksey
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/8ec2a59e/signature.asc>

From aph at redhat.com  Thu Apr  7 15:35:14 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 7 Apr 2016 16:35:14 +0100
Subject: RFR: 8153713 : aarch64: improve short array clearing using store
	pair
In-Reply-To: <CACc5Y6TJb-X1AM9sq9-iLyUWD1sPvsPhy2BbcxJf70rNSXfrFg@mail.gmail.com>
References: <CACc5Y6TJb-X1AM9sq9-iLyUWD1sPvsPhy2BbcxJf70rNSXfrFg@mail.gmail.com>
Message-ID: <57067E32.3010403@redhat.com>

On 04/07/2016 04:01 PM, Felix Yang wrote:
>     Please review webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.00/
>     JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713
> 
>     Currently, C2 compiler generate independent stores to clear
>     short arrays whose size is no bigger than parameter
>     InitArrayShortSize (refer to ClearArrayNode::Ideal function).
>     For the aarch64 port, we have store pair instruction which can
>     zero two memory words at a time and this will be good for code
>     size and maybe performance for some micro-archs.
> 
>     For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions
>     are generated with the patch, which mean about 2KB reduction in
>     codesize.
> 
>     Tested with JTreg hotspot, langtools and jdk.  Is it OK?

It looks reasonable.  It's rather a big slab of code for aarch64.ad,
and I think that it should be in MacroAssembler.  Long Chen created
MacroAssembler::zero_words, and you should create an overload of
zero_words which takes a constant int as an argument.

Andrew.


From rwestrel at redhat.com  Thu Apr  7 15:41:27 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 7 Apr 2016 17:41:27 +0200
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57060C9C.4000000@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com>
Message-ID: <57067FA7.7030907@redhat.com>


>>   http://cr.openjdk.java.net/~shade/8152753/webrev.01/

That c1_LIR.cpp change:

931       if (!opCompareAndSwap->_exchange)
do_input(opCompareAndSwap->_cmp_value);

doesn't seem right. Why do you need it?

Roland.

From aleksey.shipilev at oracle.com  Thu Apr  7 15:46:04 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 7 Apr 2016 18:46:04 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57067FA7.7030907@redhat.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com>
Message-ID: <570680BC.7030305@oracle.com>

On 04/07/2016 06:41 PM, Roland Westrelin wrote:
> 
>>>   http://cr.openjdk.java.net/~shade/8152753/webrev.01/
> 
> That c1_LIR.cpp change:
> 
> 931       if (!opCompareAndSwap->_exchange)
> do_input(opCompareAndSwap->_cmp_value);
> 
> doesn't seem right. Why do you need it?

CompareAndSwap produces boolean result, and kills cmp_value and
new_value. CompareAndExchange produces the "old"/null value result,
which is stored at the same position as cmp_value.

So, if you omit that line, LinearScan asserts when you are trying to use
the result of CompareAndExchange. AFAIU, removing the "input" property
from cmp_value, but leaving "temp" makes things back in order for
CompareAndExchange.

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160407/2b55edf9/signature.asc>

From vladimir.x.ivanov at oracle.com  Thu Apr  7 16:01:31 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 7 Apr 2016 19:01:31 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <57067DF0.5090707@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57067B3F.4060403@oracle.com> <57067DF0.5090707@oracle.com>
Message-ID: <5706845B.2080007@oracle.com>

Ok, I thought C2 support is already there on non-x86 platforms.

I'm fine with both approaches then.

Best regards,
Vladimir Ivanov

On 4/7/16 6:34 PM, Aleksey Shipilev wrote:
> On 04/07/2016 06:22 PM, Vladimir Ivanov wrote:
>> Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not
>> Compiler::is_intrinsic_supported?
>>
>> vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you
>> effectively completely disable the intrinsics on all non-x86 platforms.
>>
>> I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and
>> check it in Compiler::is_intrinsic_supported.
>
> I actually did that originally, see:
>    http://cr.openjdk.java.net/~shade/8152753/webrev.00/
>
> ...but then moved that to vmIntrinsics::is_disabled_by_flags by John's
> suggestion -- all flag sensing is done there. It is a matter of
> approach, really, and I think current version aligns better with the
> existing intrinsic flags.
>
> Non-x86 platforms have not yet implemented CAE intrinsics, and this
> forces their hand to implement both C1 and C2 parts before flipping the
> platform-dependent flag. Which may or may not be a good thing, but I
> don't have preference either way.
>
> -Aleksey
>
>> On 4/1/16 5:37 PM, Aleksey Shipilev wrote:
>>> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote:
>>>> I would like to solicit comments for C1 support for new
>>>> Unsafe.compareAndExchange intrinsics (we have support for them in C2).
>>>> The rest of new Unsafe methods that are not intrinsified by C1 are
>>>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be
>>>> emulated with existing APIs.
>>>>
>>>> Bug:
>>>>     https://bugs.openjdk.java.net/browse/JDK-8152753
>>>>
>>>> Webrev:
>>>>     http://cr.openjdk.java.net/~shade/8152753/webrev.00/
>>>
>>> Update:
>>>     http://cr.openjdk.java.net/~shade/8152753/webrev.01/
>>>
>>> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some
>>> other cleanups.
>>>
>>> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT
>>> hs-comp testset (some unrelated timeouts on SPARC).
>>>
>>> Thanks,
>>> -Aleksey
>>>
>
>

From doug.simon at oracle.com  Thu Apr  7 20:27:32 2016
From: doug.simon at oracle.com (Doug Simon)
Date: Thu, 7 Apr 2016 22:27:32 +0200
Subject: RFR: 8153782: update JVMCI sources to Eclipse 4.5.2 format style
Message-ID: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8153782
http://cr.openjdk.java.net/~dnsimon/8153782/

-Doug

From christian.thalinger at oracle.com  Thu Apr  7 20:51:04 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 7 Apr 2016 10:51:04 -1000
Subject: RFR: 8153782: update JVMCI sources to Eclipse 4.5.2 format style
In-Reply-To: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com>
References: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com>
Message-ID: <E2CDFD42-91C9-404B-89D2-A574276A1ED6@oracle.com>

Looks good.

> On Apr 7, 2016, at 10:27 AM, Doug Simon <doug.simon at oracle.com> wrote:
> 
> https://bugs.openjdk.java.net/browse/JDK-8153782
> http://cr.openjdk.java.net/~dnsimon/8153782/
> 
> -Doug


From bharadwaj.yadavalli at oracle.com  Thu Apr  7 23:36:25 2016
From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli)
Date: Thu, 7 Apr 2016 19:36:25 -0400
Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable
	diagnostic options
Message-ID: <5706EEF9.4030900@oracle.com>

Backing out the change [1] that fixed [2].

Bug: https://bugs.openjdk.java.net/browse/JDK-8153655
webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/

Testing: Ran the tests in bug report successfully using product build.

Thanks,

Bharadwaj

[1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b
[2] https://bugs.openjdk.java.net/browse/JDK-8145348


From bharadwaj.yadavalli at oracle.com  Fri Apr  8 00:01:59 2016
From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli)
Date: Thu, 7 Apr 2016 20:01:59 -0400
Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable
	diagnostic options
In-Reply-To: <5706F338.9010301@oracle.com>
References: <5706EEF9.4030900@oracle.com> <5706F338.9010301@oracle.com>
Message-ID: <5706F4F7.2050602@oracle.com>

Thanks, Jesper!

Bharadwaj

On 04/07/2016 07:54 PM, Jesper Wilhelmsson wrote:
> Looks good!
> /Jesper
>
> Den 8/4/16 kl. 01:36, skrev S. Bharadwaj Yadavalli:
>> Backing out the change [1] that fixed [2].
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153655
>> webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/
>>
>> Testing: Ran the tests in bug report successfully using product build.
>>
>> Thanks,
>>
>> Bharadwaj
>>
>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b
>> [2] https://bugs.openjdk.java.net/browse/JDK-8145348
>>
>>


From vladimir.kozlov at oracle.com  Fri Apr  8 00:20:30 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Apr 2016 17:20:30 -0700
Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable
	diagnostic options
In-Reply-To: <5706EEF9.4030900@oracle.com>
References: <5706EEF9.4030900@oracle.com>
Message-ID: <5706F94E.2070803@oracle.com>

No. This is very wrong change! The bug states that -XX:+UnlockDiagnosticVMOptions flag should be added to tests which miss it and not
revert 8145348 changes.

Vladimir

On 4/7/16 4:36 PM, S. Bharadwaj Yadavalli wrote:
> Backing out the change [1] that fixed [2].
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153655
> webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/
>
> Testing: Ran the tests in bug report successfully using product build.
>
> Thanks,
>
> Bharadwaj
>
> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b
> [2] https://bugs.openjdk.java.net/browse/JDK-8145348
>
>

From vladimir.kozlov at oracle.com  Fri Apr  8 01:11:16 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Apr 2016 18:11:16 -0700
Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable
	diagnostic options
In-Reply-To: <5706F94E.2070803@oracle.com>
References: <5706EEF9.4030900@oracle.com> <5706F94E.2070803@oracle.com>
Message-ID: <57070534.80509@oracle.com>

I talked with Bharadwaj and we decided to push backout with different bug:

8153816: Backout changes for JDK-8145348 till 8153655 is fixed

and use 8153655 for real fix as its synopsis say.

Thanks,
Vladimir

On 4/7/16 5:20 PM, Vladimir Kozlov wrote:
> No. This is very wrong change! The bug states that -XX:+UnlockDiagnosticVMOptions flag should be added to tests which miss it and not
> revert 8145348 changes.
>
> Vladimir
>
> On 4/7/16 4:36 PM, S. Bharadwaj Yadavalli wrote:
>> Backing out the change [1] that fixed [2].
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153655
>> webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/
>>
>> Testing: Ran the tests in bug report successfully using product build.
>>
>> Thanks,
>>
>> Bharadwaj
>>
>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b
>> [2] https://bugs.openjdk.java.net/browse/JDK-8145348
>>
>>

From bharadwaj.yadavalli at oracle.com  Fri Apr  8 02:39:51 2016
From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli)
Date: Thu, 7 Apr 2016 22:39:51 -0400
Subject: RFR: 8153816: [BACKOUT] Make intrinsics flags diagnostic
Message-ID: <570719F7.7010404@oracle.com>

Backing out the change [1] that fixed [2]. This is a sub-task of [3].

Bug: https://bugs.openjdk.java.net/browse/JDK-8153816
webrev: http://cr.openjdk.java.net/~bharadwaj/8153816/webrev/

Testing: Ran the tests in bug report successfully using product build.

Thanks,

Bharadwaj

[1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b
[2] https://bugs.openjdk.java.net/browse/JDK-8145348
[3] https://bugs.openjdk.java.net/browse/JDK-8153655


From vladimir.kozlov at oracle.com  Fri Apr  8 02:48:00 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Apr 2016 19:48:00 -0700
Subject: RFR: 8153816: [BACKOUT] Make intrinsics flags diagnostic
In-Reply-To: <570719F7.7010404@oracle.com>
References: <570719F7.7010404@oracle.com>
Message-ID: <57071BE0.5000708@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/7/16 7:39 PM, S. Bharadwaj Yadavalli wrote:
> Backing out the change [1] that fixed [2]. This is a sub-task of [3].
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153816
> webrev: http://cr.openjdk.java.net/~bharadwaj/8153816/webrev/
>
> Testing: Ran the tests in bug report successfully using product build.
>
> Thanks,
>
> Bharadwaj
>
> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b
> [2] https://bugs.openjdk.java.net/browse/JDK-8145348
> [3] https://bugs.openjdk.java.net/browse/JDK-8153655
>

From HORII at jp.ibm.com  Fri Apr  8 10:53:48 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Fri, 8 Apr 2016 10:53:48 +0000
Subject: enhancement of cmpxchg and copy_to_survivor for ppc64
Message-ID: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com>

Dear all:

Can I please request reviews for the following change?
This change was created for JDK 9 and ppc64.

Description:
This change adds options of compare-and-exchange for POWER architecture.
As described in atomic_linux_ppc.inline.hpp, the current implementation of
cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
general purposes because twice calls of sync before and after cmpxchg will
keep consistency. However, they sometimes cause overheads because
sync instructions are very expensive in the current POWER chip design.
With this change, callers can explicitly specify to run fence and acquire 
with
two additional bool parameters. Because their default values are "true",
it is not necessary to modify existing cmpxchg calls. 

In addition, with the new parameters of cmpxchg, this change improves
performance of copy_to_survivor in the parallel GC. 
copy_to_survivor changes forward pointers by using cmpxchg. This 
operation doesn't require any sync instructions, in my understanding. 
A pointer is changed at most once in a GC and when cmpxchg fails, 
the latest pointer is available for the caller.

When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly
doesn't support new version format of Java 9), pause time of young GC was
reduced from 10% to 20%.

Summary of source code changes:

* src/share/vm/runtime/atomic.hpp
* src/share/vm/runtime/atomic.cpp
* src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
       - Add two arguments of fence and acquire to cmpxchg only for PPC64.
         Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches,
         they are reduced while inlining to callers.

* src/share/vm/oops/oop.inline.hpp
      - Changed cas_set_mark to call cmpxchg without fence and acquire.
         cas_set_mark is called only by cas_forward_to that is called only 
by
         copy_to_survivor_space and oop_promotion_failed in 
         psPromotionManager.

Code change:

   Please see an attached diff file that was generated with "hg diff -g" 
   under the latest hotspot directory.

Passed test:
    SPECjbb2013 (customized)

* I believe some other cmpxchg will be optimized by reducing fence 
  or acquire because twice calls of sync are too conservative to implement
  Java memory model.


Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160408/6be02528/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64_cmpxchg_opt.diff
Type: application/octet-stream
Size: 8837 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160408/6be02528/ppc64_cmpxchg_opt-0001.diff>

From nils.eliasson at oracle.com  Fri Apr  8 12:27:27 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 8 Apr 2016 14:27:27 +0200
Subject: RFR(S): 8153013: BlockingCompilation test times out
Message-ID: <5707A3AF.3040807@oracle.com>

Hi,

Please review this small fix of the BlockingCompilation test.

Summary:
Add method enqueued for compilation with WB API may be removed from the 
compile queue as stale.

Solution:
Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets stale 
while the test is running. (Also added some extra checks that may spare 
us from waiting until timeout for failing.)

This is an workaround but we should consider fixing something permanent 
for WB API compiles - like tagging the compile task with info about the 
origin of the compile. The comment field has this information - but then 
it needs to be converted to an enum.

Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/

Best regards,
Nils Eliasson


From rwestrel at redhat.com  Fri Apr  8 13:54:32 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 8 Apr 2016 15:54:32 +0200
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <570680BC.7030305@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com>
	<570680BC.7030305@oracle.com>
Message-ID: <5707B818.6070405@redhat.com>


> CompareAndSwap produces boolean result, and kills cmp_value and
> new_value. CompareAndExchange produces the "old"/null value result,
> which is stored at the same position as cmp_value.
> 
> So, if you omit that line, LinearScan asserts when you are trying to use
> the result of CompareAndExchange. AFAIU, removing the "input" property
> from cmp_value, but leaving "temp" makes things back in order for
> CompareAndExchange.

That assert seems too restrictive but making that change to c1_LIR.cpp
is asking for trouble, I think. I would suggest moving the final move to
the result register to the platform dependent code (see attached patch).
Also, I noticed you don't pass the result as the correct argument of cas_*.

Roland.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cas.patch
Type: text/x-patch
Size: 2704 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160408/c72b08b6/cas.patch>

From nils.eliasson at oracle.com  Fri Apr  8 13:47:22 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 8 Apr 2016 15:47:22 +0200
Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err
In-Reply-To: <56BB7E47.4000703@oracle.com>
References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com>
	<56A13176.804@oracle.com> <56A230D7.9060606@oracle.com>
	<56A27B55.6050502@oracle.com> <56BB095A.2090500@oracle.com>
	<56BB7E47.4000703@oracle.com>
Message-ID: <5707B66A.4020006@oracle.com>

Hi,

Picking up this thread again.

On 2016-02-10 19:15, Vladimir Kozlov wrote:
> This looks almost good.
>
> There is " at the end but there is no first ":
>
That was just the end of mark of the cut out of the hs_err file.

> MaxNodeLimit:80000"
>
> "Compiling with directive:" --> "Compiling with directives:". "No 
> inline rules in directive.", again "directives".
>
> Also the list of values is difficult to navigate. To have one per line 
> is definitely overkill but organizing them in some kind of patter 
> would be nice (3 per line with the same indent, for example). It could 
> be hard to do but at least order them alphabetically.

The printout is generated by a macro for simplicity. Any sorting or 
formatting require a hand tuned print function or a macro that builds a 
list that is sorted and printed by the print function. I don't think it 
is worth the effort to have in the hs_err file.

>
> I asked before and I again forgot what it means "Enable:true 
> Exclude:false". This means you need to add more info "Enable 
> directives:true"? What is "Exclude" again?

Enable - Is the directive ok to use (otherwise disabled as in not available)
Exclude - this method can not be compiled, as in CompileCommand=exclude

I'll add comments to the flag table:

   #define compilerdirectives_common_flags(cflags) \
     cflags(Enable,                  bool, true, X)  /* false -> 
directive disabled from use */ \
     cflags(Exclude,                 bool, false, X) /* true  -> don't 
compile method */ \

>
> DisableIntrinsic: does not have value so it should not be on list. 
> Similar for others when they don't have values.
DisableIntrinsic appear to be empty because it contains the empty list. 
I can add an "" for clarity. All options have a value.

>
> Again what * means in "*PrintInlining:true"? Is it because it is 
> present on command line?

A * shows that it was set with a directive.

Regards,
Nils

>
> Thanks,
> Vladimir
>
> On 2/10/16 1:56 AM, Nils Eliasson wrote:
>> Hi,
>>
>> New webrev including Vladimirs suggestions:
>>
>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.04/
>>
>> Now it will look like this printing the directive when there are no 
>> compile commands for inlining:
>>
>> "---------------  T H R E A D  ---------------
>>
>> Current thread (0x00007f53f0468000):  JavaThread "C1 
>> CompilerThread10" daemon [_thread_in_native, id=8398,
>> stack(0x00007f52e6163000,0x00007f52e6264000)]
>>
>> Current CompileTask:
>> C1:    228    1       3       java.lang.String::isLatin1 (19 bytes)
>>
>> Compiling with directive:
>>    No inline rules in directive.
>>    Enable:true Exclude:false BreakAtExecute:false 
>> BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true
>> PrintNMethods:false ReplayInline:false DumpReplay:false 
>> DumpInline:false CompilerDirectivesIgnoreCompileCommands:false
>> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false 
>> PrintIntrinsics:false TraceOptoPipelining:false
>> TraceOptoOutput:false TraceSpilling:false Vectorize:false 
>> VectorizeDebug:false CloneMapDebug:false
>> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000"
>>
>>
>>
>> And like this when there are:
>>
>>
>> "---------------  T H R E A D  ---------------
>>
>> Current thread (0x00007feda4468800):  JavaThread "C1 
>> CompilerThread10" daemon [_thread_in_native, id=8314,
>> stack(0x00007fec9a751000,0x00007fec9a852000)]
>>
>> Current CompileTask:
>> C1:    227    1       3       java.lang.String::isLatin1 (19 bytes)
>>
>> Compiling with directive:
>>    No inline rules in directive. Following compile commands are in use:
>>      inline: b.b, a.a
>>      dontinline: c.c
>>      exclude: d.d
>>      compileonly: *.*
>>    Enable:true Exclude:false BreakAtExecute:false 
>> BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true
>> PrintNMethods:false ReplayInline:false DumpReplay:false 
>> DumpInline:false CompilerDirectivesIgnoreCompileCommands:false
>> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false 
>> PrintIntrinsics:false TraceOptoPipelining:false
>> TraceOptoOutput:false TraceSpilling:false Vectorize:false 
>> VectorizeDebug:false CloneMapDebug:false
>> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000"
>>
>> Regards,
>> Nils
>>
>> On 2016-01-22 19:56, Vladimir Kozlov wrote:
>>> "no inline - compile commands may apply" is confusing to me (and for 
>>> others who not familiar with directives). What
>>> does it mean? :)
>>> Does it mean no 'inline' directives were used or opposite: 
>>> -XX:-Inline flag was specified (or corresponding directive).
>>>
>>> If it is switch off inlining then I think it should be "don't inline".
>>> So what "compile commands may apply" means?
>>>
>>> > I updated the print output to mark all options in the directive 
>>> that are
>>> > not default with a '*'. That makes it quicker to see if any special
>>>
>>> Yes, it is better but I still did not get this. I see that command 
>>> line has PrintInlining command and it is in the
>>> list: *PrintInlining:true.
>>> But I don't see PrintCompilation on the list but it is specified on 
>>> command line. On other hand PrintIntrinsics:false
>>> is there.
>>>
>>> > It only prints the directive that is used for the current compile 
>>> task
>>> > (that caused the crash). (Thats why I put them together in the 
>>> hs_err file)
>>>
>>> What do you mean "is used"?
>>>
>>> "Print *which* directive (and options) were in use if compiler crash.
>>>  Print *if* directives were used at some point if other crash?"
>>>
>>> Should we replace "in use"/"were used" with "were set"?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 1/22/16 5:38 AM, Nils Eliasson wrote:
>>>> Hi, Vladimir
>>>>
>>>> On 2016-01-21 20:28, Vladimir Kozlov wrote:
>>>>> Passing directives through ciEnv is fine.
>>>>> My question is about output in hs_err file. How those directives were
>>>>> selected in your example?
>>>>
>>>> It only prints the directive that is used for the current compile task
>>>> (that caused the crash). (Thats why I put them together in the 
>>>> hs_err file)
>>>>
>>>>> I found it strange to see mixed flags values and oracle commands.
>>>>> "Enable:true Exclude:false" - which these correspond to, for example?
>>>>
>>>> These are all options from the directive - and they are set with
>>>> directives (highest priority), compilecommmand or vmflags (lowest
>>>> priority).
>>>>
>>>>>
>>>>> Should we not print directives/flags which are not set explicitly?
>>>>
>>>> I updated the print output to mark all options in the directive 
>>>> that are
>>>> not default with a '*'. That makes it quicker to see if any special
>>>> options was applied. It will also print if the directive is the
>>>> unmodified default directive.
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/
>>>> Example output:
>>>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/hserr.txt
>>>>
>>>> Regards,
>>>> Nils
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 1/21/16 2:31 AM, Nils Eliasson wrote:
>>>>>> This is how it looks:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> ---------------  T H R E A D  ---------------
>>>>>>
>>>>>> Current thread (0x00007f071046a000):  JavaThread "C1
>>>>>> CompilerThread10" daemon [_thread_in_native, id=20033,
>>>>>> stack(0x00007f05d7afb000,0x00007f05d7bfc000)]
>>>>>>
>>>>>> Current CompileTask:
>>>>>> C1:    225    1       3       java.lang.String::isLatin1 (19 bytes)
>>>>>>
>>>>>> Current compiler directive:
>>>>>>    inline: -
>>>>>>    Enable:true Exclude:false BreakAtExecute:false
>>>>>> BreakAtCompile:false Log:false PrintAssembly:false
>>>>>> PrintInlining:false PrintNMethods:false ReplayInline:false
>>>>>> DumpReplay:false DumpInline:false
>>>>>> CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic:
>>>>>> BlockLayoutByFrequency:true PrintOptoAssembly:false
>>>>>> PrintIntrinsics:false TraceOptoPipelining:false 
>>>>>> TraceOptoOutput:false
>>>>>> TraceSpilling:false Vectorize:false VectorizeDebug:false
>>>>>> CloneMapDebug:false DoReserveCopyInSuperWordDebug:false
>>>>>> IGVPrintLevel:0 MaxNodeLimit:80000
>>>>>>
>>>>>> Stack: [0x00007f05d7afb000,0x00007f05d7bfc000],
>>>>>> sp=0x00007f05d7bfa5d0,  free space=1021k
>>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>>>>>> C=native code)
>>>>>> V  [libjvm.so+0x12e7532]  VMError::report_and_die(int, char const*,
>>>>>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*,
>>>>>> char const*, int, unsigned long)+0x182
>>>>>> V  [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char
>>>>>> const*, int, char const*, char const*, __va_list_tag*)+0x4a
>>>>>> V  [libjvm.so+0x908cca]  report_vm_error(char const*, int, char
>>>>>> const*, char const*, ...)+0xea
>>>>>> V  [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*,
>>>>>> CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1
>>>>>> V  [libjvm.so+0x88ec5a]
>>>>>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a
>>>>>> V  [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540
>>>>>> V  [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9
>>>>>> V  [libjvm.so+0x1264ac6]  JavaThread::run()+0x2a6
>>>>>> V  [libjvm.so+0x10189aa]  java_start(Thread*)+0xca
>>>>>> C  [libpthread.so.0+0x8182]  start_thread+0xc2
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt
>>>>>>
>>>>>> Regards,
>>>>>> Nils
>>>>>>
>>>>>> On 2016-01-21 11:25, Nils Eliasson wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please review this small change. The diff looks big but most of the
>>>>>>> change is just changing how the directive are
>>>>>>> passed to the compilers. Directives are set in the ciEnv and then
>>>>>>> passed to the compilers. The compilers can then
>>>>>>> choose to add it to any internal compilation object for 
>>>>>>> convenience.
>>>>>>> The hs_err printing routine in vmError.cpp loads
>>>>>>> the directive from the ciEnv.
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756
>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nils
>>>>>>
>>>>
>>


From felix.yang at linaro.org  Fri Apr  8 14:36:02 2016
From: felix.yang at linaro.org (Felix Yang)
Date: Fri, 8 Apr 2016 22:36:02 +0800
Subject: RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode
Message-ID: <CACc5Y6S9XN30UOwyjo9gEtFmGyAAyCnTd0vLy_8qqEoVmJK+rQ@mail.gmail.com>

Hi,

    Please review webrev: *http://cr.openjdk.java.net/~fyang/8153837/webrev.00/
<http://cr.openjdk.java.net/~fyang/8153837/webrev.00/>*
    JIRA Issue:
*https://bugs.openjdk.java.net/browse/JDK-8153837
<https://bugs.openjdk.java.net/browse/JDK-8153837>*
    Patch handles code generation for special cases where one arg is -1/0/1
of MaxINode & MinINode eliminating one extra mov instruction.
    As shown in the JIRA Issue, I saw some occurrences of the specail cases
in specJBB2005 and other benchmarks.
    The patch does something like this:

     min(x, 1)
     =>
     cmp x, 0
     csinc x, x, zr, le

     min(x, -1)
     =>
     cmp x, 0
     csinv x, x, zr, lt

     max(x, 1)
     =>
     cmp x, 0
     csinc x, x, zr, gt

     max(x, -1)
     =>
     cmp x, 0
     csinv x, x, zr, ge

    Tested with JTreg hotspot, langtools.  Is it OK?

Thanks,
Felix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160408/39f54d6f/attachment-0001.html>

From pavel.punegov at oracle.com  Fri Apr  8 14:40:12 2016
From: pavel.punegov at oracle.com (Pavel Punegov)
Date: Fri, 8 Apr 2016 17:40:12 +0300
Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package
Message-ID: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com>

Hi,

please review the following change to JITtester:
- rewrite TypeUtil and move to utils package
- add javadoc for each method in the class

bug: https://bugs.openjdk.java.net/browse/JDK-8153852
 <http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/>webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/ <http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/>
? Thanks,
Pavel Punegov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160408/460798ce/attachment.html>

From aph at redhat.com  Fri Apr  8 14:42:34 2016
From: aph at redhat.com (Andrew Haley)
Date: Fri, 8 Apr 2016 15:42:34 +0100
Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases
	for MaxINode & MinINode
In-Reply-To: <CACc5Y6S9XN30UOwyjo9gEtFmGyAAyCnTd0vLy_8qqEoVmJK+rQ@mail.gmail.com>
References: <CACc5Y6S9XN30UOwyjo9gEtFmGyAAyCnTd0vLy_8qqEoVmJK+rQ@mail.gmail.com>
Message-ID: <5707C35A.2060000@redhat.com>

On 04/08/2016 03:36 PM, Felix Yang wrote:
>  Is it OK?

It looks good, but I'm surely going to need a jtreg test case
which exercises all these patterns in C2.

Thanks,

Andrew.


From vladimir.x.ivanov at oracle.com  Fri Apr  8 16:47:49 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 8 Apr 2016 19:47:49 +0300
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
Message-ID: <5707E0B5.5080501@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/

https://bugs.openjdk.java.net/browse/JDK-8153540

Unsafe.allocateInstance intrinsic can instantiate arrays, but the 
allocation logic is broken.

The proposed fix is to perform necessary checks in Java code before 
calling the intrinsic.

I did some performance measurements [1] and reflection (non-constant 
class) case (non-constant class) regressed ~5-10% due to new guards added.

I also experimented with a hotspot-only fix [2], but it looks uglier. 
So, if you consider the regression in reflective case non-critical, I'd 
prefer to go with JDK checks.

Testing: regression test, JPRT, RBT (pit-hs-comp; in progress), 
microbenchmarks.

Thanks!

Best regards,
Vladimir Ivanov

[1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java

Baseline:
   AllocInstance.testConstant    avgt   25  3.736 ? 0.054  ns/op
   AllocInstance.testReflective  avgt   25  5.880 ? 0.080  ns/op

JDK fix:
   AllocInstance.testConstant    avgt   25  3.959 ? 0.205  ns/op
   AllocInstance.testReflective  avgt   25  6.274 ? 0.180  ns/op

[2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path

   AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op
   AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op


From vladimir.kozlov at oracle.com  Fri Apr  8 17:04:29 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Apr 2016 10:04:29 -0700
Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package
In-Reply-To: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com>
References: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com>
Message-ID: <5707E49D.4040400@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/8/16 7:40 AM, Pavel Punegov wrote:
> Hi,
>
> please review the following change to JITtester:
> - rewrite TypeUtil and move to utils package
> - add javadoc for each method in the class
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8153852
> <http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/>webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/
>
> ? Thanks,
> Pavel Punegov
>

From vladimir.kozlov at oracle.com  Fri Apr  8 17:09:27 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Apr 2016 10:09:27 -0700
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <5707A3AF.3040807@oracle.com>
References: <5707A3AF.3040807@oracle.com>
Message-ID: <5707E5C7.3000000@oracle.com>

What do you mean "stale"?
I would prefer to see the real fix as you suggested to avoid removing WB comp tasks from queue. Adding timeout is not 
reliable.

Thanks,
Vladimir

On 4/8/16 5:27 AM, Nils Eliasson wrote:
> Hi,
>
> Please review this small fix of the BlockingCompilation test.
>
> Summary:
> Add method enqueued for compilation with WB API may be removed from the compile queue as stale.
>
> Solution:
> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets stale while the test is running. (Also added some extra
> checks that may spare us from waiting until timeout for failing.)
>
> This is an workaround but we should consider fixing something permanent for WB API compiles - like tagging the compile
> task with info about the origin of the compile. The comment field has this information - but then it needs to be
> converted to an enum.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>
> Best regards,
> Nils Eliasson
>
>
>
>

From aleksey.shipilev at oracle.com  Fri Apr  8 17:28:43 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 8 Apr 2016 20:28:43 +0300
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <5707B818.6070405@redhat.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com>
	<570680BC.7030305@oracle.com> <5707B818.6070405@redhat.com>
Message-ID: <5707EA4B.8030306@oracle.com>

On 04/08/2016 04:54 PM, Roland Westrelin wrote:
> 
>> CompareAndSwap produces boolean result, and kills cmp_value and
>> new_value. CompareAndExchange produces the "old"/null value result,
>> which is stored at the same position as cmp_value.
>>
>> So, if you omit that line, LinearScan asserts when you are trying to use
>> the result of CompareAndExchange. AFAIU, removing the "input" property
>> from cmp_value, but leaving "temp" makes things back in order for
>> CompareAndExchange.
> 
> That assert seems too restrictive but making that change to c1_LIR.cpp
> is asking for trouble, I think. I would suggest moving the final move to
> the result register to the platform dependent code (see attached patch).
> Also, I noticed you don't pass the result as the correct argument of cas_*.

D'oh. Thank you, Roland!

Merged your changes here:
  http://cr.openjdk.java.net/~shade/8152753/webrev.02/

Re-spinning the RBT testing now.

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160408/2c298e49/signature.asc>

From vladimir.kozlov at oracle.com  Fri Apr  8 17:30:16 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Apr 2016 10:30:16 -0700
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <5707E0B5.5080501@oracle.com>
References: <5707E0B5.5080501@oracle.com>
Message-ID: <5707EAA8.5020005@oracle.com>

 > Unsafe.allocateInstance intrinsic can instantiate arrays, but the allocation logic is broken.

But it should not allocate arrays. Right? This is what your java changes do now.

Should it be allocateInstance0 ?:

+ // public native Object Unsafe.allocateInstance(Class<?> cls);

You removed next stop check. Is it because java code will cat the NULL?:
     Node* cls = null_check(argument(1));
     if (stopped())  return true;

The test misses bug number @bug

Thanks,
Vladimir K

On 4/8/16 9:47 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/
>
> https://bugs.openjdk.java.net/browse/JDK-8153540
>
> Unsafe.allocateInstance intrinsic can instantiate arrays, but the allocation logic is broken.
>
> The proposed fix is to perform necessary checks in Java code before calling the intrinsic.
>
> I did some performance measurements [1] and reflection (non-constant class) case (non-constant class) regressed ~5-10%
> due to new guards added.
>
> I also experimented with a hotspot-only fix [2], but it looks uglier. So, if you consider the regression in reflective
> case non-critical, I'd prefer to go with JDK checks.
>
> Testing: regression test, JPRT, RBT (pit-hs-comp; in progress), microbenchmarks.
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov
>
> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java
>
> Baseline:
>    AllocInstance.testConstant    avgt   25  3.736 ? 0.054  ns/op
>    AllocInstance.testReflective  avgt   25  5.880 ? 0.080  ns/op
>
> JDK fix:
>    AllocInstance.testConstant    avgt   25  3.959 ? 0.205  ns/op
>    AllocInstance.testReflective  avgt   25  6.274 ? 0.180  ns/op
>
> [2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path
>
>    AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op
>    AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op
>

From vladimir.kozlov at oracle.com  Fri Apr  8 17:39:20 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Apr 2016 10:39:20 -0700
Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err
In-Reply-To: <5707B66A.4020006@oracle.com>
References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com>
	<56A13176.804@oracle.com> <56A230D7.9060606@oracle.com>
	<56A27B55.6050502@oracle.com> <56BB095A.2090500@oracle.com>
	<56BB7E47.4000703@oracle.com> <5707B66A.4020006@oracle.com>
Message-ID: <5707ECC8.7000107@oracle.com>

On 4/8/16 6:47 AM, Nils Eliasson wrote:
> Hi,
>
> Picking up this thread again.
>
> On 2016-02-10 19:15, Vladimir Kozlov wrote:
>> This looks almost good.
>>
>> There is " at the end but there is no first ":
>>
> That was just the end of mark of the cut out of the hs_err file.
>
>> MaxNodeLimit:80000"
>>
>> "Compiling with directive:" --> "Compiling with directives:". "No inline rules in directive.", again "directives".
>>
>> Also the list of values is difficult to navigate. To have one per line is definitely overkill but organizing them in
>> some kind of patter would be nice (3 per line with the same indent, for example). It could be hard to do but at least
>> order them alphabetically.
>
> The printout is generated by a macro for simplicity. Any sorting or formatting require a hand tuned print function or a
> macro that builds a list that is sorted and printed by the print function. I don't think it is worth the effort to have
> in the hs_err file.
>

This is really unfortunate. :(

>>
>> I asked before and I again forgot what it means "Enable:true Exclude:false". This means you need to add more info
>> "Enable directives:true"? What is "Exclude" again?
>
> Enable - Is the directive ok to use (otherwise disabled as in not available)
> Exclude - this method can not be compiled, as in CompileCommand=exclude
>
> I'll add comments to the flag table:
>
>    #define compilerdirectives_common_flags(cflags) \
>      cflags(Enable,                  bool, true, X)  /* false -> directive disabled from use */ \
>      cflags(Exclude,                 bool, false, X) /* true  -> don't compile method */ \

Should we rename these flags to make them more clear:
EnableDirective
ExcludeCompile

>
>>
>> DisableIntrinsic: does not have value so it should not be on list. Similar for others when they don't have values.
> DisableIntrinsic appear to be empty because it contains the empty list. I can add an "" for clarity. All options have a
> value.

Don't crate more noise when it is not needed. It is hs_err file - it is used to understand what happened. Useless 
information does not help. Even if DisableIntrinsic is specified in directive it should not be listed if it does not 
have value.

Actually I think it is bug - we should not accept directive or flag with empty string value.

>
>>
>> Again what * means in "*PrintInlining:true"? Is it because it is present on command line?
>
> A * shows that it was set with a directive.

Okay.

Thanks,
Vladimir

>
> Regards,
> Nils
>
>>
>> Thanks,
>> Vladimir
>>
>> On 2/10/16 1:56 AM, Nils Eliasson wrote:
>>> Hi,
>>>
>>> New webrev including Vladimirs suggestions:
>>>
>>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.04/
>>>
>>> Now it will look like this printing the directive when there are no compile commands for inlining:
>>>
>>> "---------------  T H R E A D  ---------------
>>>
>>> Current thread (0x00007f53f0468000):  JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=8398,
>>> stack(0x00007f52e6163000,0x00007f52e6264000)]
>>>
>>> Current CompileTask:
>>> C1:    228    1       3       java.lang.String::isLatin1 (19 bytes)
>>>
>>> Compiling with directive:
>>>    No inline rules in directive.
>>>    Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true
>>> PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false
>>> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false
>>> TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false
>>> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000"
>>>
>>>
>>>
>>> And like this when there are:
>>>
>>>
>>> "---------------  T H R E A D  ---------------
>>>
>>> Current thread (0x00007feda4468800):  JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=8314,
>>> stack(0x00007fec9a751000,0x00007fec9a852000)]
>>>
>>> Current CompileTask:
>>> C1:    227    1       3       java.lang.String::isLatin1 (19 bytes)
>>>
>>> Compiling with directive:
>>>    No inline rules in directive. Following compile commands are in use:
>>>      inline: b.b, a.a
>>>      dontinline: c.c
>>>      exclude: d.d
>>>      compileonly: *.*
>>>    Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true
>>> PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false
>>> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false
>>> TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false
>>> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000"
>>>
>>> Regards,
>>> Nils
>>>
>>> On 2016-01-22 19:56, Vladimir Kozlov wrote:
>>>> "no inline - compile commands may apply" is confusing to me (and for others who not familiar with directives). What
>>>> does it mean? :)
>>>> Does it mean no 'inline' directives were used or opposite: -XX:-Inline flag was specified (or corresponding directive).
>>>>
>>>> If it is switch off inlining then I think it should be "don't inline".
>>>> So what "compile commands may apply" means?
>>>>
>>>> > I updated the print output to mark all options in the directive that are
>>>> > not default with a '*'. That makes it quicker to see if any special
>>>>
>>>> Yes, it is better but I still did not get this. I see that command line has PrintInlining command and it is in the
>>>> list: *PrintInlining:true.
>>>> But I don't see PrintCompilation on the list but it is specified on command line. On other hand PrintIntrinsics:false
>>>> is there.
>>>>
>>>> > It only prints the directive that is used for the current compile task
>>>> > (that caused the crash). (Thats why I put them together in the hs_err file)
>>>>
>>>> What do you mean "is used"?
>>>>
>>>> "Print *which* directive (and options) were in use if compiler crash.
>>>>  Print *if* directives were used at some point if other crash?"
>>>>
>>>> Should we replace "in use"/"were used" with "were set"?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 1/22/16 5:38 AM, Nils Eliasson wrote:
>>>>> Hi, Vladimir
>>>>>
>>>>> On 2016-01-21 20:28, Vladimir Kozlov wrote:
>>>>>> Passing directives through ciEnv is fine.
>>>>>> My question is about output in hs_err file. How those directives were
>>>>>> selected in your example?
>>>>>
>>>>> It only prints the directive that is used for the current compile task
>>>>> (that caused the crash). (Thats why I put them together in the hs_err file)
>>>>>
>>>>>> I found it strange to see mixed flags values and oracle commands.
>>>>>> "Enable:true Exclude:false" - which these correspond to, for example?
>>>>>
>>>>> These are all options from the directive - and they are set with
>>>>> directives (highest priority), compilecommmand or vmflags (lowest
>>>>> priority).
>>>>>
>>>>>>
>>>>>> Should we not print directives/flags which are not set explicitly?
>>>>>
>>>>> I updated the print output to mark all options in the directive that are
>>>>> not default with a '*'. That makes it quicker to see if any special
>>>>> options was applied. It will also print if the directive is the
>>>>> unmodified default directive.
>>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/
>>>>> Example output:
>>>>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/hserr.txt
>>>>>
>>>>> Regards,
>>>>> Nils
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 1/21/16 2:31 AM, Nils Eliasson wrote:
>>>>>>> This is how it looks:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> ---------------  T H R E A D  ---------------
>>>>>>>
>>>>>>> Current thread (0x00007f071046a000):  JavaThread "C1
>>>>>>> CompilerThread10" daemon [_thread_in_native, id=20033,
>>>>>>> stack(0x00007f05d7afb000,0x00007f05d7bfc000)]
>>>>>>>
>>>>>>> Current CompileTask:
>>>>>>> C1:    225    1       3       java.lang.String::isLatin1 (19 bytes)
>>>>>>>
>>>>>>> Current compiler directive:
>>>>>>>    inline: -
>>>>>>>    Enable:true Exclude:false BreakAtExecute:false
>>>>>>> BreakAtCompile:false Log:false PrintAssembly:false
>>>>>>> PrintInlining:false PrintNMethods:false ReplayInline:false
>>>>>>> DumpReplay:false DumpInline:false
>>>>>>> CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic:
>>>>>>> BlockLayoutByFrequency:true PrintOptoAssembly:false
>>>>>>> PrintIntrinsics:false TraceOptoPipelining:false TraceOptoOutput:false
>>>>>>> TraceSpilling:false Vectorize:false VectorizeDebug:false
>>>>>>> CloneMapDebug:false DoReserveCopyInSuperWordDebug:false
>>>>>>> IGVPrintLevel:0 MaxNodeLimit:80000
>>>>>>>
>>>>>>> Stack: [0x00007f05d7afb000,0x00007f05d7bfc000],
>>>>>>> sp=0x00007f05d7bfa5d0,  free space=1021k
>>>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>>>>>>> C=native code)
>>>>>>> V  [libjvm.so+0x12e7532]  VMError::report_and_die(int, char const*,
>>>>>>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*,
>>>>>>> char const*, int, unsigned long)+0x182
>>>>>>> V  [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char
>>>>>>> const*, int, char const*, char const*, __va_list_tag*)+0x4a
>>>>>>> V  [libjvm.so+0x908cca]  report_vm_error(char const*, int, char
>>>>>>> const*, char const*, ...)+0xea
>>>>>>> V  [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*,
>>>>>>> CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1
>>>>>>> V  [libjvm.so+0x88ec5a]
>>>>>>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a
>>>>>>> V  [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540
>>>>>>> V  [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9
>>>>>>> V  [libjvm.so+0x1264ac6]  JavaThread::run()+0x2a6
>>>>>>> V  [libjvm.so+0x10189aa]  java_start(Thread*)+0xca
>>>>>>> C  [libpthread.so.0+0x8182]  start_thread+0xc2
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nils
>>>>>>>
>>>>>>> On 2016-01-21 11:25, Nils Eliasson wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please review this small change. The diff looks big but most of the
>>>>>>>> change is just changing how the directive are
>>>>>>>> passed to the compilers. Directives are set in the ciEnv and then
>>>>>>>> passed to the compilers. The compilers can then
>>>>>>>> choose to add it to any internal compilation object for convenience.
>>>>>>>> The hs_err printing routine in vmError.cpp loads
>>>>>>>> the directive from the ciEnv.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756
>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Nils
>>>>>>>
>>>>>
>>>
>

From vladimir.kozlov at oracle.com  Fri Apr  8 18:37:40 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Apr 2016 11:37:40 -0700
Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to
	shared code.
Message-ID: <5707FA74.5060207@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8153818
webrev:
http://cr.openjdk.java.net/~kvn/8153818/

Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp.

Regular testing.

Thanks,
Vladimir

From aleksey.shipilev at oracle.com  Fri Apr  8 18:42:14 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 8 Apr 2016 21:42:14 +0300
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <5707E0B5.5080501@oracle.com>
References: <5707E0B5.5080501@oracle.com>
Message-ID: <5707FB86.1020408@oracle.com>

On 04/08/2016 07:47 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/
> 
> https://bugs.openjdk.java.net/browse/JDK-8153540
> 
> Unsafe.allocateInstance intrinsic can instantiate arrays, but the
> allocation logic is broken.
> 
> The proposed fix is to perform necessary checks in Java code before
> calling the intrinsic.

I like Java-side fix better.

> I did some performance measurements [1] and reflection (non-constant
> class) case (non-constant class) regressed ~5-10% due to new guards added.

My quick perfasm runs seems to show this is because a subtle difference:
  http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm
  http://cr.openjdk.java.net/~shade/8153540/patched.perfasm

If you compare these, then the difference seems to be the instruction
scheduling and a branch in the guards code.

Baseline:

  0.60%    0.53%    0x00007fafafd34dac: mov    0xc(%rsi),%r10d
  3.43%    4.47%    0x00007fafafd34db0: mov    0x50(%r10),%rsi
  0.30%    0.24%    0x00007fafafd34db4: movzbl 0x172(%rsi),%r8d
  1.40%    2.09%    0x00007fafafd34dbc: mov    0x8(%rsi),%r10d
  0.98%    1.48%    0x00007fafafd34dc0: add    $0xfffffffc,%r8d
  2.71%    4.28%    0x00007fafafd34dc4: mov    %r10d,%r11d
  0.03%    0.04%    0x00007fafafd34dc7: and    $0x1,%r11d
  1.02%    1.41%    0x00007fafafd34dcb: or     %r11d,%r8d
  2.51%    2.49%    0x00007fafafd34dce: test   %r8d,%r8d

Patched:

  0.59%    0.76%   0x00007fd1c4de1c2c: mov    0xc(%rsi),%r11d
  3.48%    3.83%   0x00007fd1c4de1c30: mov    0x50(%r11),%rsi
  0.35%    0.36%   0x00007fd1c4de1c34: mov    0x8(%rsi),%r10d
  1.47%    1.69%   0x00007fd1c4de1c38: test   %r10d,%r10d
                   0x00007fd1c4de1c3b: jl     0x00007fd1c4de1ce5
  1.18%    1.48%   0x00007fd1c4de1c41: movzbl 0x172(%rsi),%r11d
  2.82%    3.93%   0x00007fd1c4de1c49: mov    %r10d,%r9d
  0.01%    0.03%   0x00007fd1c4de1c4c: and    $0x1,%r9d
  0.33%    0.59%   0x00007fd1c4de1c50: add    $0xfffffffc,%r11d
  0.93%    0.92%   0x00007fd1c4de1c54: or     %r9d,%r11d
  2.65%    2.47%   0x00007fd1c4de1c57: test   %r11d,%r11d

Unfortunately, a simple fix of replacing "||" with "|" explodes the
generated code. Maybe something else is doable there.


> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java

Suggestions to improve fidelity:
  * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance
  * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on
@Benchmarks if you want to use -prof perfasm

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160408/f81e2678/signature.asc>

From igor.veresov at oracle.com  Fri Apr  8 19:18:48 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Fri, 8 Apr 2016 12:18:48 -0700
Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific
	code to shared code.
In-Reply-To: <5707FA74.5060207@oracle.com>
References: <5707FA74.5060207@oracle.com>
Message-ID: <36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com>

Looks good.

igor

> On Apr 8, 2016, at 11:37 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> https://bugs.openjdk.java.net/browse/JDK-8153818
> webrev:
> http://cr.openjdk.java.net/~kvn/8153818/
> 
> Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp.
> 
> Regular testing.
> 
> Thanks,
> Vladimir


From vladimir.kozlov at oracle.com  Fri Apr  8 19:21:31 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Apr 2016 12:21:31 -0700
Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific
	code to shared code.
In-Reply-To: <36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com>
References: <5707FA74.5060207@oracle.com>
	<36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com>
Message-ID: <570804BB.5080001@oracle.com>

Thank you Igor for reviews.

Vladimir

On 4/8/16 12:18 PM, Igor Veresov wrote:
> Looks good.
>
> igor
>
>> On Apr 8, 2016, at 11:37 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153818
>> webrev:
>> http://cr.openjdk.java.net/~kvn/8153818/
>>
>> Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp.
>>
>> Regular testing.
>>
>> Thanks,
>> Vladimir
>

From christian.thalinger at oracle.com  Fri Apr  8 20:14:30 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 8 Apr 2016 10:14:30 -1000
Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific
	code to shared code.
In-Reply-To: <5707FA74.5060207@oracle.com>
References: <5707FA74.5060207@oracle.com>
Message-ID: <348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com>

Looks good.

> On Apr 8, 2016, at 8:37 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> https://bugs.openjdk.java.net/browse/JDK-8153818
> webrev:
> http://cr.openjdk.java.net/~kvn/8153818/
> 
> Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp.
> 
> Regular testing.
> 
> Thanks,
> Vladimir


From vladimir.kozlov at oracle.com  Fri Apr  8 20:17:46 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Apr 2016 13:17:46 -0700
Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific
	code to shared code.
In-Reply-To: <348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com>
References: <5707FA74.5060207@oracle.com>
	<348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com>
Message-ID: <570811EA.3040203@oracle.com>

Thank you, Chris, for reviews

Vladimir

On 4/8/16 1:14 PM, Christian Thalinger wrote:
> Looks good.
>
>> On Apr 8, 2016, at 8:37 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153818
>> webrev:
>> http://cr.openjdk.java.net/~kvn/8153818/
>>
>> Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp.
>>
>> Regular testing.
>>
>> Thanks,
>> Vladimir
>

From rwestrel at redhat.com  Mon Apr 11 07:57:10 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 11 Apr 2016 09:57:10 +0200
Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86)
In-Reply-To: <5707EA4B.8030306@oracle.com>
References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com>
	<57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com>
	<570680BC.7030305@oracle.com> <5707B818.6070405@redhat.com>
	<5707EA4B.8030306@oracle.com>
Message-ID: <570B58D6.90408@redhat.com>


> Merged your changes here:
>   http://cr.openjdk.java.net/~shade/8152753/webrev.02/

That looks good to me.

Roland.


From vladimir.x.ivanov at oracle.com  Mon Apr 11 11:07:26 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 11 Apr 2016 14:07:26 +0300
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <5707EAA8.5020005@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707EAA8.5020005@oracle.com>
Message-ID: <570B856E.3000206@oracle.com>

Thanks for the feedback, Vladimir.

>  > Unsafe.allocateInstance intrinsic can instantiate arrays, but the
> allocation logic is broken.
>
> But it should not allocate arrays. Right? This is what your java changes
> do now.
Yes, instance allocation logic doesn't work for arrays. It allocates 
broken array instances. That's why I decided to filter out arrays.

>
> Should it be allocateInstance0 ?:
>
> + // public native Object Unsafe.allocateInstance(Class<?> cls);

It is:
      @HotSpotIntrinsicCandidate
-    public native Object allocateInstance(Class<?> cls)
-        throws InstantiationException;
+    private native Object allocateInstance0(Class<?> cls) throws 
InstantiationException;

>
> You removed next stop check. Is it because java code will cat the NULL?:
>      Node* cls = null_check(argument(1));
>      if (stopped())  return true;
Yes, cls.isPrimitive() does null checks on both cls and klass.

>
> The test misses bug number @bug
Ok, will add.

Best regards,
Vladimir Ivanov

>
> Thanks,
> Vladimir K
>
> On 4/8/16 9:47 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/
>> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153540
>>
>> Unsafe.allocateInstance intrinsic can instantiate arrays, but the
>> allocation logic is broken.
>>
>> The proposed fix is to perform necessary checks in Java code before
>> calling the intrinsic.
>>
>> I did some performance measurements [1] and reflection (non-constant
>> class) case (non-constant class) regressed ~5-10%
>> due to new guards added.
>>
>> I also experimented with a hotspot-only fix [2], but it looks uglier.
>> So, if you consider the regression in reflective
>> case non-critical, I'd prefer to go with JDK checks.
>>
>> Testing: regression test, JPRT, RBT (pit-hs-comp; in progress),
>> microbenchmarks.
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java
>>
>> Baseline:
>>    AllocInstance.testConstant    avgt   25  3.736 ? 0.054  ns/op
>>    AllocInstance.testReflective  avgt   25  5.880 ? 0.080  ns/op
>>
>> JDK fix:
>>    AllocInstance.testConstant    avgt   25  3.959 ? 0.205  ns/op
>>    AllocInstance.testReflective  avgt   25  6.274 ? 0.180  ns/op
>>
>> [2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path
>>
>>    AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op
>>    AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op
>>

From felix.yang at linaro.org  Mon Apr 11 11:47:32 2016
From: felix.yang at linaro.org (Felix Yang)
Date: Mon, 11 Apr 2016 19:47:32 +0800
Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases
	for MaxINode & MinINode
In-Reply-To: <5707C35A.2060000@redhat.com>
References: <CACc5Y6S9XN30UOwyjo9gEtFmGyAAyCnTd0vLy_8qqEoVmJK+rQ@mail.gmail.com>
	<5707C35A.2060000@redhat.com>
Message-ID: <CACc5Y6Q4X8KTOKQ5WqfJ+Gtxen8f8LbspjunJNDO5XsPQFoMaA@mail.gmail.com>

Hi,

    Thanks for reviewing the patch.
    I find that the cases the patch tries to catch here are the result of
loop transformations.
    And it's hard to produce a test case to for it simply using the
Math.min/max API.  (Seems C2 will not create a MaxINode/MinINode for a call
for these APIs)
    But I do noticed some JTReg hotspot test cases that already generates
the pattern.

    Example JTReg test cases:
    hotspot/test/compiler/rangechecks/TestExplicitRangeChecks.java
    hotspot/test/compiler/rangechecks/TestBadFoldCompare.java
    hotspot/test/compiler/rangechecks/PowerOf2SizedArraysChecks.java
    hotspot/test/compiler/rangechecks/TestRangeCheckSmearing.java

    For the first test, I saw the following instructions in C1 JIT code:
    $ grep "csel" JTwork/compiler/rangechecks/TestExplicitRangeChecks.jtr |
grep zr | grep gt
    0x0000007f9e127e8c: csel      w14, w10, wzr, gt
    0x0000007f9e129568: csel      w11, w12, wzr, gt  ;*aload_1 {reexecute=0
rethrow=0 return_oop=0}
    0x0000007f9e147c78: csel      w12, w11, wzr, gt
    0x0000007f9e15c75c: csel      w12, w13, wzr, gt
    0x0000007f9e1689e4: csel      w16, w22, wzr, gt
    0x0000007f9e18a570: csel      w13, w11, wzr, gt
    0x0000007f7e13e278: csel      w12, w11, wzr, gt
    0x0000007f7e14f55c: csel      w13, w13, wzr, gt
    $ grep "csinc" JTwork/compiler/rangechecks/TestExplicitRangeChecks.jtr
    0x0000007f9e112860: csinc     w12, w12, wzr, gt
    0x0000007f9e120e40: csinc     w11, w11, wzr, gt
    0x0000007f7e114de0: csinc     w12, w12, wzr, gt
    0x0000007f7e123440: csinc     w11, w11, wzr, gt

    I also searched the C2 JIT code of specJBB2005 & Spark Terasort and I
saw the following csel/csinc/csinv generated with the patch:
  1. Spark Terasort:
  0x0000007f990ec898: csinc     w14, w0, wzr, le
  0x0000007f990f3a40: csinv     w13, w11, wzr, ge
  0x0000007f990f3a94: csinv     w11, w13, wzr, ge
  0x0000007f9912c1a8: csinc     w14, w13, wzr, le
  0x0000007f9912afe8: csinc     w12, w10, wzr, le  ;*aload_1
  0x0000007f99137f90: csinv     w12, w12, wzr, ge
  0x0000007f99137fe4: csinv     w13, w12, wzr, ge
  0x0000007f99132ff8: csinc     w11, w10, wzr, le  ;*aload_1
  0x0000007f9917fdfc: csinc     w12, w15, wzr, le
  0x0000007f991a5e3c: csinv     w11, w11, wzr, ge
  0x0000007f991a5e90: csinv     w12, w11, wzr, ge
  0x0000007f991133bc: csinc     w0, w12, wzr, le
  0x0000007f9918e548: csinc     w12, w18, wzr, le  ;*aload_0
  0x0000007f991639f8: csinc     w16, w12, wzr, le
  0x0000007f99115508: csinc     w3, w13, wzr, le
  0x0000007f991d7e38: csinc     w13, w14, wzr, le  ;*aload_1
  0x0000007f992f7e48: csinc     w12, w13, wzr, le  ;*aload_0
  0x0000007f992dd578: csinv     w13, w10, wzr, ge
  0x0000007f992e7370: csinv     w17, w14, wzr, ge
  0x0000007f99222ec8: csinc     w12, w13, wzr, le  ;*aload_0
  0x0000007f993ae208: csinc     w10, w15, wzr, le
  0x0000007f99405604: csinc     w10, w13, wzr, le
  0x0000007f9931da84: csinc     w11, w13, wzr, le
  0x0000007f9941eb04: csinv     w15, w11, wzr, ge  ;*lload_0
  0x0000007f9941ec4c: csinv     w15, w14, wzr, ge  ;*iload
  0x0000007f994a3110: csinc     w11, w13, wzr, le
  0x0000007f990e78d8: csel      w11, w11, wzr, gt
  0x0000007f990dc8e0: csel      w11, w12, wzr, gt
  0x0000007f990dc8f0: csel      w11, w11, wzr, gt
  0x0000007f990f5f00: csel      w12, w12, wzr, gt
  0x0000007f990ff83c: csel      w11, w10, wzr, gt
  0x0000007f9914e0dc: csel      w13, w12, wzr, gt
  0x0000007f991504f0: csel      w10, w11, wzr, gt
  0x0000007f9918cbac: csel      w20, w20, wzr, gt
  0x0000007f991a32dc: csel      w10, w10, wzr, gt
  0x0000007f990f3a64: csel      w13, w13, wzr, gt
  0x0000007f990f3ad0: csel      w10, w10, wzr, gt
  0x0000007f99114d98: csel      w16, w14, wzr, gt
  0x0000007f991f0434: csel      w0, w10, wzr, gt
  0x0000007f9920753c: csel      w10, w11, wzr, gt
  0x0000007f9920754c: csel      w10, w10, wzr, gt
  0x0000007f99213270: csel      w12, w12, wzr, gt
  0x0000007f9923a9f8: csel      w18, w15, wzr, gt
  0x0000007f99210ad4: csel      w11, w11, wzr, gt
  0x0000007f9926a524: csel      w12, w11, wzr, gt
  0x0000007f9929e3d0: csel      w10, w11, wzr, gt
  0x0000007f9929e3ec: csel      w11, w11, wzr, gt
  0x0000007f992a6c90: csel      w11, w12, wzr, gt
  0x0000007f99214044: csel      w13, w11, wzr, gt
  0x0000007f99242d04: csel      w11, w12, wzr, gt
  0x0000007f99420260: csel      w10, w11, wzr, gt  ;*checkcast
  0x0000007f993b1d14: csel      w12, w12, wzr, gt
  0x0000007f993fe15c: csel      w11, w11, wzr, gt  ;*checkcast
  0x0000007f99460688: csel      w10, w11, wzr, gt

  2. specJBB2005:
  0x0000007f7e1233e0: csinc     w12, w12, wzr, gt
  0x0000007f7e1632f8: csinc     w10, w10, wzr, gt
  0x0000007f7e1c8a1c: csinv     w10, w2, wzr, ge
  0x0000007f7e14d2cc: csel      w0, w12, wzr, gt
  0x0000007f7e19982c: csel      w10, w10, wzr, gt  ;*baload {reexecute=0
rethrow=0 return_oop=0}
  0x0000007f7e1bad4c: csel      w12, w3, wzr, gt
  0x0000007f7e27e25c: csel      w10, w10, wzr, gt  ;*baload {reexecute=0
rethrow=0 return_oop=0}

    So the patch got tested for the most part and is not causing us trouble.

On 8 April 2016 at 22:42, Andrew Haley <aph at redhat.com> wrote:

> On 04/08/2016 03:36 PM, Felix Yang wrote:
> >  Is it OK?
>
> It looks good, but I'm surely going to need a jtreg test case
> which exercises all these patterns in C2.
>
> Thanks,
>
> Andrew.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160411/87dcd596/attachment.html>

From vladimir.x.ivanov at oracle.com  Mon Apr 11 11:50:03 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 11 Apr 2016 14:50:03 +0300
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <5707FB86.1020408@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com>
Message-ID: <570B8F6B.2030206@oracle.com>

Thanks for the feedback, Aleksey.

>> I did some performance measurements [1] and reflection (non-constant
>> class) case (non-constant class) regressed ~5-10% due to new guards added.
>
> My quick perfasm runs seems to show this is because a subtle difference:
>    http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm
>    http://cr.openjdk.java.net/~shade/8153540/patched.perfasm
>
> If you compare these, then the difference seems to be the instruction
> scheduling and a branch in the guards code.
>
> Baseline:
...
> Patched:
...
>
> Unfortunately, a simple fix of replacing "||" with "|" explodes the
> generated code. Maybe something else is doable there.

Yes, C2 can't fuse Class.isArray with slow bit check from 
Klass::layout_helper. (Partly, because they dispatch to different 
places: !Class.isArray() case dispatches to explicit exception 
instantiation and slow path calls into runtime).

Additional flag in a mirror (j.l.Class) which marks instance klasses 
could help here, but I'm still not sure it's worth the effort.

Ideally, something like [1] (which requires 2 new intrinsics):

   * isFastAllocatable() performs all necessary checks: null checks on 
cls, not primitive, not array, not interface, not abstract, fully 
initialized, no finalizers;

   * allocateInstanceSlow() handles all cases the intrisic doesn't 
handle: either throws IE or does necessary operations (e.g., initialize 
the class or register a finalizer) when instantiating an object.

Best regards,
Vladimir Ivanov

[1]
     @ForceInline
     public Object allocateInstance(Class<?> cls) throws 
InstantiationException {
         // Interfaces and abstract classes are handled by the intrinsic.
         if (isFastAllocatable(cls)) {
             return allocateInstance0(cls);
         } else {
             return allocateInstanceSlow(cls);
         }
     }

     @HotSpotIntrinsicCandidate
     private native boolean isFastAllocatable(Class<?> cls);

     @HotSpotIntrinsicCandidate
     private native Object allocateInstance0(Class<?> cls) throws 
InstantiationException;

     // Calls into modified OptoRuntime::new_instance_C
     @HotSpotIntrinsicCandidate
     private native Object allocateInstanceSlow(Class<?> cls) throws 
InstantiationException;


>
>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java
>
> Suggestions to improve fidelity:
>    * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance
>    * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on
> @Benchmarks if you want to use -prof perfasm
>
> Thanks,
> -Aleksey
>
>

From aph at redhat.com  Mon Apr 11 11:56:02 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 11 Apr 2016 12:56:02 +0100
Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases
	for MaxINode & MinINode
In-Reply-To: <CACc5Y6Q4X8KTOKQ5WqfJ+Gtxen8f8LbspjunJNDO5XsPQFoMaA@mail.gmail.com>
References: <CACc5Y6S9XN30UOwyjo9gEtFmGyAAyCnTd0vLy_8qqEoVmJK+rQ@mail.gmail.com>
	<5707C35A.2060000@redhat.com>
	<CACc5Y6Q4X8KTOKQ5WqfJ+Gtxen8f8LbspjunJNDO5XsPQFoMaA@mail.gmail.com>
Message-ID: <570B90D2.5020900@redhat.com>

Hi,

On 04/11/2016 12:47 PM, Felix Yang wrote:
> 
>     Thanks for reviewing the patch.

>     I find that the cases the patch tries to catch here are the
>     result of loop transformations.
>     And it's hard to produce a test case to for it simply using the
>     Math.min/max API.  (Seems C2 will not create a MaxINode/MinINode
>     for a call for these APIs) But I do noticed some JTReg hotspot
>     test cases that already generates the pattern.
> 
>     So the patch got tested for the most part and is not causing us trouble.

Sure, but that doesn't necessarily give us test coverage of the
changes you've made.  Is the problem that you don't know how to write
test cases to cover your changes?

Andrew.


From vladimir.kempik at oracle.com  Mon Apr 11 12:00:28 2016
From: vladimir.kempik at oracle.com (Vladimir Kempik)
Date: Mon, 11 Apr 2016 15:00:28 +0300
Subject: [8u] RFR 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
Message-ID: <570B91DC.2040904@oracle.com>

Hello

Please review this backport of 8130309 to jdk8u.

Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope.

Testing: jprt, failing test.

Bug: https://bugs.openjdk.java.net/browse/JDK-8130309
Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/

Thanks
-Vladimir


From nils.eliasson at oracle.com  Mon Apr 11 12:09:29 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 11 Apr 2016 14:09:29 +0200
Subject: RFR(S): 8153885: [TESTBUG] few regression tests failed after 8151880
	changes
Message-ID: <570B93F9.5030005@oracle.com>

Hi,

Please review this fix.

Summary:
1) Add -XX:-UseCounterDecay to three tests that where using the 
compile()-method through the getBCI method.
2) Fix CompileCommand patterns and comments still using the old 
SimpleTestCase$Helper pattern.

Testing:
Running all compiler JTREG tests on linux-x64.

Bug: https://bugs.openjdk.java.net/browse/JDK-8153885
Webrev: http://cr.openjdk.java.net/~neliasso/8153885/webrev.01/

Regards,
Nils Eliasson

From tobias.hartmann at oracle.com  Mon Apr 11 12:26:43 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Apr 2016 14:26:43 +0200
Subject: [8u] RFR 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <570B91DC.2040904@oracle.com>
References: <570B91DC.2040904@oracle.com>
Message-ID: <570B9803.2030509@oracle.com>

Hi Vladimir,

On 11.04.2016 14:00, Vladimir Kempik wrote:
> Hello
> 
> Please review this backport of 8130309 to jdk8u.
> 
> Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope.
> 
> Testing: jprt, failing test.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309
> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/

Looks good to me. Thanks for backporting this!

Best regards,
Tobias

> 
> Thanks
> -Vladimir
> 

From felix.yang at linaro.org  Mon Apr 11 12:48:02 2016
From: felix.yang at linaro.org (Felix Yang)
Date: Mon, 11 Apr 2016 20:48:02 +0800
Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases
	for MaxINode & MinINode
In-Reply-To: <570B90D2.5020900@redhat.com>
References: <CACc5Y6S9XN30UOwyjo9gEtFmGyAAyCnTd0vLy_8qqEoVmJK+rQ@mail.gmail.com>
	<5707C35A.2060000@redhat.com>
	<CACc5Y6Q4X8KTOKQ5WqfJ+Gtxen8f8LbspjunJNDO5XsPQFoMaA@mail.gmail.com>
	<570B90D2.5020900@redhat.com>
Message-ID: <CACc5Y6SE3V5PhVWQUsJ5BtqrPC8gPzgc2NLL5Y0TEQc+njNg0w@mail.gmail.com>

Hi,

    Yes, I haven't looked into the details of the loop transformation code
and find it hard to produce a test case at least for now.
    And it seems to me necessary to do so in order to produce a good JTReg
test case for the patch.
    But I am not quite sure if I can come up with a test case which covers
all the templates added.
    Any suggestions?

Thanks,
Felix


On 11 April 2016 at 19:56, Andrew Haley <aph at redhat.com> wrote:

> Hi,
>
> On 04/11/2016 12:47 PM, Felix Yang wrote:
> >
> >     Thanks for reviewing the patch.
>
> >     I find that the cases the patch tries to catch here are the
> >     result of loop transformations.
> >     And it's hard to produce a test case to for it simply using the
> >     Math.min/max API.  (Seems C2 will not create a MaxINode/MinINode
> >     for a call for these APIs) But I do noticed some JTReg hotspot
> >     test cases that already generates the pattern.
> >
> >     So the patch got tested for the most part and is not causing us
> trouble.
>
> Sure, but that doesn't necessarily give us test coverage of the
> changes you've made.  Is the problem that you don't know how to write
> test cases to cover your changes?
>
> Andrew.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160411/9030b0c4/attachment.html>

From pavel.punegov at oracle.com  Mon Apr 11 14:02:06 2016
From: pavel.punegov at oracle.com (Pavel Punegov)
Date: Mon, 11 Apr 2016 17:02:06 +0300
Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package
In-Reply-To: <5707E49D.4040400@oracle.com>
References: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com>
	<5707E49D.4040400@oracle.com>
Message-ID: <35C50286-0D4C-4F3E-8862-FFDA32334390@oracle.com>

Thanks for review, Vladimir

> On 08 Apr 2016, at 20:04, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 4/8/16 7:40 AM, Pavel Punegov wrote:
>> Hi,
>> 
>> please review the following change to JITtester:
>> - rewrite TypeUtil and move to utils package
>> - add javadoc for each method in the class
>> 
>> bug: https://bugs.openjdk.java.net/browse/JDK-8153852
>> <http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/>webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/
>> 
>> ? Thanks,
>> Pavel Punegov
>> 


From pavel.punegov at oracle.com  Mon Apr 11 14:02:57 2016
From: pavel.punegov at oracle.com (Pavel Punegov)
Date: Mon, 11 Apr 2016 17:02:57 +0300
Subject: RFR(XS): 8140354: Unquarantine tests that failed with
	OutOfMemoryError
In-Reply-To: <2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com>
References: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com>
	<2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com>
Message-ID: <1940774F-D052-4995-97A7-3E2485229CA3@oracle.com>

Thanks for review, Igor

> On 07 Apr 2016, at 16:43, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> Pavel,
> 
> looks good to me
> 
> ? Igor
>> On Apr 7, 2016, at 4:32 PM, Pavel Punegov <pavel.punegov at oracle.com> wrote:
>> 
>> Hi,
>> 
>> please review this fix to unquarantine CompilerControl tests after the  JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2]
>> The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM.
>> 
>> [1] https://bugs.openjdk.java.net/browse/JDK-8140354
>> [2] https://bugs.openjdk.java.net/browse/JDK-8144621
>> --
>> webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/
>> bug https://bugs.openjdk.java.net/browse/JDK-8153661
>> 
>> ? Pavel.
>> 
> 


From vladimir.kozlov at oracle.com  Mon Apr 11 19:13:42 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 11 Apr 2016 12:13:42 -0700
Subject: RFR(S): 8153885: [TESTBUG] few regression tests failed after
	8151880 changes
In-Reply-To: <570B93F9.5030005@oracle.com>
References: <570B93F9.5030005@oracle.com>
Message-ID: <570BF766.9020300@oracle.com>

Looks good.

thanks,
Vladimir

On 4/11/16 5:09 AM, Nils Eliasson wrote:
> Hi,
>
> Please review this fix.
>
> Summary:
> 1) Add -XX:-UseCounterDecay to three tests that where using the
> compile()-method through the getBCI method.
> 2) Fix CompileCommand patterns and comments still using the old
> SimpleTestCase$Helper pattern.
>
> Testing:
> Running all compiler JTREG tests on linux-x64.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153885
> Webrev: http://cr.openjdk.java.net/~neliasso/8153885/webrev.01/
>
> Regards,
> Nils Eliasson

From christian.thalinger at oracle.com  Mon Apr 11 22:09:24 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 11 Apr 2016 12:09:24 -1000
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <570B8F6B.2030206@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com>
	<570B8F6B.2030206@oracle.com>
Message-ID: <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com>


> On Apr 11, 2016, at 1:50 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Thanks for the feedback, Aleksey.
> 
>>> I did some performance measurements [1] and reflection (non-constant
>>> class) case (non-constant class) regressed ~5-10% due to new guards added.
>> 
>> My quick perfasm runs seems to show this is because a subtle difference:
>>   http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm
>>   http://cr.openjdk.java.net/~shade/8153540/patched.perfasm
>> 
>> If you compare these, then the difference seems to be the instruction
>> scheduling and a branch in the guards code.
>> 
>> Baseline:
> ...
>> Patched:
> ...
>> 
>> Unfortunately, a simple fix of replacing "||" with "|" explodes the
>> generated code. Maybe something else is doable there.
> 
> Yes, C2 can't fuse Class.isArray with slow bit check from Klass::layout_helper. (Partly, because they dispatch to different places: !Class.isArray() case dispatches to explicit exception instantiation and slow path calls into runtime).
> 
> Additional flag in a mirror (j.l.Class) which marks instance klasses could help here, but I'm still not sure it's worth the effort.
> 
> Ideally, something like [1] (which requires 2 new intrinsics):

I would advise against that.  We are fixing a long-standing bug here and although we see a regression the code we produced before was just wrong.  Comparing against something that was wrong in the first place is moot.

Take the hit; I doubt it will show up at customer applications.

> 
>  * isFastAllocatable() performs all necessary checks: null checks on cls, not primitive, not array, not interface, not abstract, fully initialized, no finalizers;
> 
>  * allocateInstanceSlow() handles all cases the intrisic doesn't handle: either throws IE or does necessary operations (e.g., initialize the class or register a finalizer) when instantiating an object.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1]
>    @ForceInline
>    public Object allocateInstance(Class<?> cls) throws InstantiationException {
>        // Interfaces and abstract classes are handled by the intrinsic.
>        if (isFastAllocatable(cls)) {
>            return allocateInstance0(cls);
>        } else {
>            return allocateInstanceSlow(cls);
>        }
>    }
> 
>    @HotSpotIntrinsicCandidate
>    private native boolean isFastAllocatable(Class<?> cls);
> 
>    @HotSpotIntrinsicCandidate
>    private native Object allocateInstance0(Class<?> cls) throws InstantiationException;
> 
>    // Calls into modified OptoRuntime::new_instance_C
>    @HotSpotIntrinsicCandidate
>    private native Object allocateInstanceSlow(Class<?> cls) throws InstantiationException;
> 
> 
>> 
>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java
>> 
>> Suggestions to improve fidelity:
>>   * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance
>>   * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on
>> @Benchmarks if you want to use -prof perfasm
>> 
>> Thanks,
>> -Aleksey
>> 
>> 


From igor.ignatyev at oracle.com  Tue Apr 12 00:03:30 2016
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 11 Apr 2016 17:03:30 -0700
Subject: RFR(XS) : 8152376 : [TESTBUG]
	compiler/floatingpoint/Test15FloatJNIArgs should use run
	main/othervm/native
Message-ID: <53A08B91-81FD-4C36-9FF2-780AAE3D99CB@oracle.com>

http://cr.openjdk.java.net/~iignatyev/8152376/webrev.00/
> 3 lines changed: 0 ins; 0 del; 3 mod;


Hi all,

could you please review this small fix which adds '/native? option to all main actions? the test uses native library, all such tests should be marked w/ ?/native? option so jtreg would be able to handle them accordingly.

JBS: https://bugs.openjdk.java.net/browse/JDK-8152376
webrev: http://cr.openjdk.java.net/~iignatyev/8152376/webrev.00/

Thanks,
? Igor

From martin.doerr at sap.com  Tue Apr 12 09:45:54 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 12 Apr 2016 09:45:54 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
	<57061D3E.8050408@oracle.com>
	<f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>
	<570632FF.7090103@redhat.com>
	<805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap>
Message-ID: <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap>

Hi,

I think we have come to a common understanding and there was no complaint about my latest webrev:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/

Can I consider it reviewed?
Can somebody sponsor, please?

Thanks and best regards,
Martin


-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin
Sent: Donnerstag, 7. April 2016 12:52
To: Andrew Haley <aph at redhat.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Hi Andrew, Jamsheed and all,

thank you very much for your input.

As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count().
Therefore, I have replaced the storestore barrier introduced with JDK-8143897  (even though this barrier was also correct).

My change still contains a releasing store for newly created ExceptionCache instances.
As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms.
I think having the release doesn't hurt too much and makes the design a little cleaner.

I also added comments based on your input.

The new webrev is here:
http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/

Please review. I will also need a sponsor from Oracle, please.

Thanks again and best regards,
Martin


-----Original Message-----
From: Andrew Haley [mailto:aph at redhat.com] 
Sent: Donnerstag, 7. April 2016 12:14
To: Doerr, Martin <martin.doerr at sap.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

On 07/04/16 10:08, Doerr, Martin wrote:

> atomic update for the _count would only be required if there were
> multiply threads which attempt to increment it
> concurrently. However, updates are under lock, so we only have
> concurrent readers which is ok. 
> 
> I still think "volatile" does what we need here. Especially the xlC
> compiler on AIX tends to reload variables from memory. Exactly this
> can be prevented by making the field volatile.

I think your latest patch is OK.  Whether volatile is really good
enough, I don't know.  The new(ish) C++ memory model treats this as a
race, and therefore undefined behaviour.  Old C++ didn't have a memory
model, so the best we can do with racy code is guess about what our
compilers might do.

I certainly much prefer a release_store to the storestore fence used
in the fix for 8143897.

Andrew.

From felix.yang at linaro.org  Tue Apr 12 10:49:33 2016
From: felix.yang at linaro.org (Felix Yang)
Date: Tue, 12 Apr 2016 18:49:33 +0800
Subject: RFR: 8153713 : aarch64: improve short array clearing using store
	pair
In-Reply-To: <57067E32.3010403@redhat.com>
References: <CACc5Y6TJb-X1AM9sq9-iLyUWD1sPvsPhy2BbcxJf70rNSXfrFg@mail.gmail.com>
	<57067E32.3010403@redhat.com>
Message-ID: <CACc5Y6QPZ-57Cwv5g=+QCoyLRjji8VT1_ZS=_mhXj36-5u2NQg@mail.gmail.com>

Done.  New webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.01

Tested with JTreg hotspot, langtools and jdk.

Thanks,
Felix

On 7 April 2016 at 23:35, Andrew Haley <aph at redhat.com> wrote:

> On 04/07/2016 04:01 PM, Felix Yang wrote:
> >     Please review webrev:
> http://cr.openjdk.java.net/~fyang/8153713/webrev.00/
> >     JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713
> >
> >     Currently, C2 compiler generate independent stores to clear
> >     short arrays whose size is no bigger than parameter
> >     InitArrayShortSize (refer to ClearArrayNode::Ideal function).
> >     For the aarch64 port, we have store pair instruction which can
> >     zero two memory words at a time and this will be good for code
> >     size and maybe performance for some micro-archs.
> >
> >     For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions
> >     are generated with the patch, which mean about 2KB reduction in
> >     codesize.
> >
> >     Tested with JTreg hotspot, langtools and jdk.  Is it OK?
>
> It looks reasonable.  It's rather a big slab of code for aarch64.ad,
> and I think that it should be in MacroAssembler.  Long Chen created
> MacroAssembler::zero_words, and you should create an overload of
> zero_words which takes a constant int as an argument.
>
> Andrew.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160412/34e108ca/attachment.html>

From vladimir.x.ivanov at oracle.com  Tue Apr 12 11:07:46 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 12 Apr 2016 14:07:46 +0300
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com>
	<570B8F6B.2030206@oracle.com>
	<06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com>
Message-ID: <570CD702.4070909@oracle.com>


>> Additional flag in a mirror (j.l.Class) which marks instance klasses could help here, but I'm still not sure it's worth the effort.
>>
>> Ideally, something like [1] (which requires 2 new intrinsics):
>
> I would advise against that.  We are fixing a long-standing bug here and although we see a regression the code we produced before was just wrong.  Comparing against something that was wrong in the first place is moot.

It wasn't intended as a call for action, but more like a backup plan if 
there's a need to speed up the reflection case.

I'd like to keep the fix simple and current version looks good enough:
   http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00

Any Reviews, please?

Best regards,
Vladimir Ivanov

>
> Take the hit; I doubt it will show up at customer applications.
>
>>
>>   * isFastAllocatable() performs all necessary checks: null checks on cls, not primitive, not array, not interface, not abstract, fully initialized, no finalizers;
>>
>>   * allocateInstanceSlow() handles all cases the intrisic doesn't handle: either throws IE or does necessary operations (e.g., initialize the class or register a finalizer) when instantiating an object.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1]
>>     @ForceInline
>>     public Object allocateInstance(Class<?> cls) throws InstantiationException {
>>         // Interfaces and abstract classes are handled by the intrinsic.
>>         if (isFastAllocatable(cls)) {
>>             return allocateInstance0(cls);
>>         } else {
>>             return allocateInstanceSlow(cls);
>>         }
>>     }
>>
>>     @HotSpotIntrinsicCandidate
>>     private native boolean isFastAllocatable(Class<?> cls);
>>
>>     @HotSpotIntrinsicCandidate
>>     private native Object allocateInstance0(Class<?> cls) throws InstantiationException;
>>
>>     // Calls into modified OptoRuntime::new_instance_C
>>     @HotSpotIntrinsicCandidate
>>     private native Object allocateInstanceSlow(Class<?> cls) throws InstantiationException;
>>
>>
>>>
>>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java
>>>
>>> Suggestions to improve fidelity:
>>>    * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance
>>>    * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on
>>> @Benchmarks if you want to use -prof perfasm
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>>
>

From aph at redhat.com  Tue Apr 12 12:30:22 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 12 Apr 2016 13:30:22 +0100
Subject: RFR: 8153713 : aarch64: improve short array clearing using store
	pair
In-Reply-To: <CACc5Y6QPZ-57Cwv5g=+QCoyLRjji8VT1_ZS=_mhXj36-5u2NQg@mail.gmail.com>
References: <CACc5Y6TJb-X1AM9sq9-iLyUWD1sPvsPhy2BbcxJf70rNSXfrFg@mail.gmail.com>
	<57067E32.3010403@redhat.com>
	<CACc5Y6QPZ-57Cwv5g=+QCoyLRjji8VT1_ZS=_mhXj36-5u2NQg@mail.gmail.com>
Message-ID: <570CEA5E.2040208@redhat.com>

On 04/12/2016 11:49 AM, Felix Yang wrote:
> Done.  New webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.01

That looks fine.  Thanks.

Andrew.


From nils.eliasson at oracle.com  Tue Apr 12 13:30:19 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 12 Apr 2016 15:30:19 +0200
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <5707E5C7.3000000@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
Message-ID: <570CF86B.3050804@oracle.com>

Tasks get evicted from the compile_queue if their invocation counter 
hasn't increased during TieredCompileTaskTimeout. 
(AdvancedThresholdPolicy::is_stale(...)).

I'll do a proper fix, it is the right thing to do and should be pretty 
quick. I'll change the comment to an enum that represent who submitted 
the compile, and add a table for the comments. This could be useful in 
other settings to.

Regards,
Nils

On 2016-04-08 19:09, Vladimir Kozlov wrote:
> What do you mean "stale"?
> I would prefer to see the real fix as you suggested to avoid removing 
> WB comp tasks from queue. Adding timeout is not reliable.
>
> Thanks,
> Vladimir
>
> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>> Hi,
>>
>> Please review this small fix of the BlockingCompilation test.
>>
>> Summary:
>> Add method enqueued for compilation with WB API may be removed from 
>> the compile queue as stale.
>>
>> Solution:
>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets 
>> stale while the test is running. (Also added some extra
>> checks that may spare us from waiting until timeout for failing.)
>>
>> This is an workaround but we should consider fixing something 
>> permanent for WB API compiles - like tagging the compile
>> task with info about the origin of the compile. The comment field has 
>> this information - but then it needs to be
>> converted to an enum.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>
>> Best regards,
>> Nils Eliasson
>>
>>
>>
>>


From vladimir.kozlov at oracle.com  Tue Apr 12 16:33:12 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 12 Apr 2016 09:33:12 -0700
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <570CD702.4070909@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com>
	<570B8F6B.2030206@oracle.com>
	<06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com>
	<570CD702.4070909@oracle.com>
Message-ID: <570D2348.5000204@oracle.com>

You did not fix comment:

+ // public native Object Unsafe.allocateInstance(Class<?> cls);

should be:

+ // private native Object allocateInstance0(Class<?> cls) throws 
InstantiationException;

An other question: does it really throw InstantiationException?

Thanks,
Vladimir

On 4/12/16 4:07 AM, Vladimir Ivanov wrote:
>
>>> Additional flag in a mirror (j.l.Class) which marks instance klasses
>>> could help here, but I'm still not sure it's worth the effort.
>>>
>>> Ideally, something like [1] (which requires 2 new intrinsics):
>>
>> I would advise against that.  We are fixing a long-standing bug here
>> and although we see a regression the code we produced before was just
>> wrong.  Comparing against something that was wrong in the first place
>> is moot.
>
> It wasn't intended as a call for action, but more like a backup plan if
> there's a need to speed up the reflection case.
>
> I'd like to keep the fix simple and current version looks good enough:
>    http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00
>
> Any Reviews, please?
>
> Best regards,
> Vladimir Ivanov
>
>>
>> Take the hit; I doubt it will show up at customer applications.
>>
>>>
>>>   * isFastAllocatable() performs all necessary checks: null checks on
>>> cls, not primitive, not array, not interface, not abstract, fully
>>> initialized, no finalizers;
>>>
>>>   * allocateInstanceSlow() handles all cases the intrisic doesn't
>>> handle: either throws IE or does necessary operations (e.g.,
>>> initialize the class or register a finalizer) when instantiating an
>>> object.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1]
>>>     @ForceInline
>>>     public Object allocateInstance(Class<?> cls) throws
>>> InstantiationException {
>>>         // Interfaces and abstract classes are handled by the intrinsic.
>>>         if (isFastAllocatable(cls)) {
>>>             return allocateInstance0(cls);
>>>         } else {
>>>             return allocateInstanceSlow(cls);
>>>         }
>>>     }
>>>
>>>     @HotSpotIntrinsicCandidate
>>>     private native boolean isFastAllocatable(Class<?> cls);
>>>
>>>     @HotSpotIntrinsicCandidate
>>>     private native Object allocateInstance0(Class<?> cls) throws
>>> InstantiationException;
>>>
>>>     // Calls into modified OptoRuntime::new_instance_C
>>>     @HotSpotIntrinsicCandidate
>>>     private native Object allocateInstanceSlow(Class<?> cls) throws
>>> InstantiationException;
>>>
>>>
>>>>
>>>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java
>>>>
>>>> Suggestions to improve fidelity:
>>>>    * Run allocation benchmarks with -Xmx1g -Xms1g; this improves
>>>> variance
>>>>    * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on
>>>> @Benchmarks if you want to use -prof perfasm
>>>>
>>>> Thanks,
>>>> -Aleksey
>>>>
>>>>
>>

From vladimir.kozlov at oracle.com  Tue Apr 12 16:55:05 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 12 Apr 2016 09:55:05 -0700
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <570CF86B.3050804@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com>
Message-ID: <570D2869.5030206@oracle.com>

On 4/12/16 6:30 AM, Nils Eliasson wrote:
> Tasks get evicted from the compile_queue if their invocation counter
> hasn't increased during TieredCompileTaskTimeout.
> (AdvancedThresholdPolicy::is_stale(...)).
>
> I'll do a proper fix, it is the right thing to do and should be pretty
> quick. I'll change the comment to an enum that represent who submitted
> the compile, and add a table for the comments. This could be useful in
> other settings to.

Sounds good.

Thanks,
Vladimir

>
> Regards,
> Nils
>
> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>> What do you mean "stale"?
>> I would prefer to see the real fix as you suggested to avoid removing
>> WB comp tasks from queue. Adding timeout is not reliable.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>> Hi,
>>>
>>> Please review this small fix of the BlockingCompilation test.
>>>
>>> Summary:
>>> Add method enqueued for compilation with WB API may be removed from
>>> the compile queue as stale.
>>>
>>> Solution:
>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>> stale while the test is running. (Also added some extra
>>> checks that may spare us from waiting until timeout for failing.)
>>>
>>> This is an workaround but we should consider fixing something
>>> permanent for WB API compiles - like tagging the compile
>>> task with info about the origin of the compile. The comment field has
>>> this information - but then it needs to be
>>> converted to an enum.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>
>>> Best regards,
>>> Nils Eliasson
>>>
>>>
>>>
>>>
>

From vladimir.x.ivanov at oracle.com  Tue Apr 12 16:55:41 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 12 Apr 2016 19:55:41 +0300
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <570D2348.5000204@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com>
	<570B8F6B.2030206@oracle.com>
	<06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com>
	<570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com>
Message-ID: <570D288D.8020106@oracle.com>


On 4/12/16 7:33 PM, Vladimir Kozlov wrote:
> You did not fix comment:
>
> + // public native Object Unsafe.allocateInstance(Class<?> cls);
>
> should be:
>
> + // private native Object allocateInstance0(Class<?> cls) throws
> InstantiationException;

Ok, finally found where it is :-)
Incorporated (will update the webrev shortly).

> An other question: does it really throw InstantiationException?

Yes, it does throw IE from runtime call on slow path for abstract 
classes & interfaces (they have slow bit set in layout_helper).

I didn't move the check into Java, because I didn't want to add yet 
another guard on fast path.

Best regards,
Vladimir Ivanov

>
> On 4/12/16 4:07 AM, Vladimir Ivanov wrote:
>>
>>>> Additional flag in a mirror (j.l.Class) which marks instance klasses
>>>> could help here, but I'm still not sure it's worth the effort.
>>>>
>>>> Ideally, something like [1] (which requires 2 new intrinsics):
>>>
>>> I would advise against that.  We are fixing a long-standing bug here
>>> and although we see a regression the code we produced before was just
>>> wrong.  Comparing against something that was wrong in the first place
>>> is moot.
>>
>> It wasn't intended as a call for action, but more like a backup plan if
>> there's a need to speed up the reflection case.
>>
>> I'd like to keep the fix simple and current version looks good enough:
>>    http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00
>>
>> Any Reviews, please?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> Take the hit; I doubt it will show up at customer applications.
>>>
>>>>
>>>>   * isFastAllocatable() performs all necessary checks: null checks on
>>>> cls, not primitive, not array, not interface, not abstract, fully
>>>> initialized, no finalizers;
>>>>
>>>>   * allocateInstanceSlow() handles all cases the intrisic doesn't
>>>> handle: either throws IE or does necessary operations (e.g.,
>>>> initialize the class or register a finalizer) when instantiating an
>>>> object.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [1]
>>>>     @ForceInline
>>>>     public Object allocateInstance(Class<?> cls) throws
>>>> InstantiationException {
>>>>         // Interfaces and abstract classes are handled by the
>>>> intrinsic.
>>>>         if (isFastAllocatable(cls)) {
>>>>             return allocateInstance0(cls);
>>>>         } else {
>>>>             return allocateInstanceSlow(cls);
>>>>         }
>>>>     }
>>>>
>>>>     @HotSpotIntrinsicCandidate
>>>>     private native boolean isFastAllocatable(Class<?> cls);
>>>>
>>>>     @HotSpotIntrinsicCandidate
>>>>     private native Object allocateInstance0(Class<?> cls) throws
>>>> InstantiationException;
>>>>
>>>>     // Calls into modified OptoRuntime::new_instance_C
>>>>     @HotSpotIntrinsicCandidate
>>>>     private native Object allocateInstanceSlow(Class<?> cls) throws
>>>> InstantiationException;
>>>>
>>>>
>>>>>
>>>>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java
>>>>>
>>>>> Suggestions to improve fidelity:
>>>>>    * Run allocation benchmarks with -Xmx1g -Xms1g; this improves
>>>>> variance
>>>>>    * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on
>>>>> @Benchmarks if you want to use -prof perfasm
>>>>>
>>>>> Thanks,
>>>>> -Aleksey
>>>>>
>>>>>
>>>

From karen.kinnear at oracle.com  Tue Apr 12 17:18:32 2016
From: karen.kinnear at oracle.com (Karen Kinnear)
Date: Tue, 12 Apr 2016 13:18:32 -0400
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<E1FCFC2E-0D1F-4191-9F8B-DA1BF8817F4A@oracle.com>
	<1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com>
	<4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com>
	<9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com>
	<713E9C18-274A-41F5-AA40-56A78A608763@oracle.com>
Message-ID: <2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com>

Igor,

My apologies, I thought you had already decided to push. Yes, I am good with the changes.
Sorry to keep you waiting.

thanks,
Karen

> On Apr 6, 2016, at 1:49 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately.
> 
> Thanks,
> igor
> 
>> On Apr 5, 2016, at 4:22 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
>> 
>> 
>>> On Apr 5, 2016, at 3:33 PM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
>>> 
>>> Igor,
>>> 
>>> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter
>>> for instance?
>> 
>> Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed.
>> 
>>> 
>>> If so, I am ok with checking this in - further notes below.
>>> 
>>>> On Apr 5, 2016, at 3:43 PM, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>>>> 
>>>> 
>>>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear <karen.kinnear at oracle.com <mailto:karen.kinnear at oracle.com>> wrote:
>>>>> 
>>>>> I am in agreement with Lois that the JVMS looks good with moving the exception.
>>>> 
>>>> Thanks!
>>>>> 
>>>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next
>>>>> meeting I will check one more time. It might be worth adding a comment.
>>>> 
>>>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/>
>>>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle.
>>>> 
>>>>> 
>>>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks
>>>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null.
>>>>> 
>>>> 
>>>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 <https://bugs.openjdk.java.net/browse/JDK-8152903>
>>>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp).
>>> 
>>> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match
>>> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup.
>>> That is ok with me - I will add a note to the bug.
>> 
>> Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa?
>> 
>>> 
>>> Also: I see a ciMethod::check_call that has a comment - 
>>> IT appears to fail when applied to an invoke interface call site.
>>> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. 
>>> 
>> 
>> This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas?
>> 
>> igor
>> 
>>> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take
>>> the subtleties of invoke interface and invoke special into account. 
>>>> 
>>>> igor
>>>> 
>>>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the
>>>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code,
>>>>> so that you get the correct behavior depending on the requesting byte code.
>>>>> 
>>>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so
>>>>> I could use help studying this a bit more to understand if this really is resolution or is really selection.
>>>>> 
>>>>> thanks,
>>>>> Karen
>>>>> 
>>>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>>>>> Hi Lois,
>>>>>>> 
>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>>>>> 
>>>>>>> igor
>>>>>> Hi Igor,
>>>>>> 
>>>>>> Thanks for waiting on this.  A couple of comments:
>>>>>> 
>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>>>>>> 
>>>>>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>>>>>> 
>>>>>> Just curious did you also run the testbase default methods tests?
>>>>>> Lois
>>>>>> 
>>>>>>> 
>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>>> 
>>>>>>>> Hi Igor,
>>>>>>>> 
>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Lois
>>>>>>>> 
>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>>>>>> 
>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 <https://bugs.openjdk.java.net/browse/JDK-8153115>
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/>
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> igor
>> 
> 


From igor.veresov at oracle.com  Tue Apr 12 18:54:33 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 12 Apr 2016 11:54:33 -0700
Subject: RFR(S) 8153115: Move private interface check to linktime
In-Reply-To: <2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com>
References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com>
	<5703E823.8050400@oracle.com>
	<86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com>
	<5703F717.702@oracle.com>
	<E1FCFC2E-0D1F-4191-9F8B-DA1BF8817F4A@oracle.com>
	<1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com>
	<4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com>
	<9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com>
	<713E9C18-274A-41F5-AA40-56A78A608763@oracle.com>
	<2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com>
Message-ID: <CF23507E-6A48-4762-8959-8749696371FB@oracle.com>

Thanks, Karen!

igor

> On Apr 12, 2016, at 10:18 AM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
> 
> Igor,
> 
> My apologies, I thought you had already decided to push. Yes, I am good with the changes.
> Sorry to keep you waiting.
> 
> thanks,
> Karen
> 
>> On Apr 6, 2016, at 1:49 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
>> 
>> Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately.
>> 
>> Thanks,
>> igor
>> 
>>> On Apr 5, 2016, at 4:22 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
>>> 
>>> 
>>>> On Apr 5, 2016, at 3:33 PM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
>>>> 
>>>> Igor,
>>>> 
>>>> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter
>>>> for instance?
>>> 
>>> Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed.
>>> 
>>>> 
>>>> If so, I am ok with checking this in - further notes below.
>>>> 
>>>>> On Apr 5, 2016, at 3:43 PM, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>>>>> 
>>>>> 
>>>>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear <karen.kinnear at oracle.com <mailto:karen.kinnear at oracle.com>> wrote:
>>>>>> 
>>>>>> I am in agreement with Lois that the JVMS looks good with moving the exception.
>>>>> 
>>>>> Thanks!
>>>>>> 
>>>>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next
>>>>>> meeting I will check one more time. It might be worth adding a comment.
>>>>> 
>>>>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/>
>>>>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle.
>>>>> 
>>>>>> 
>>>>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks
>>>>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null.
>>>>>> 
>>>>> 
>>>>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 <https://bugs.openjdk.java.net/browse/JDK-8152903>
>>>>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp).
>>>> 
>>>> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match
>>>> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup.
>>>> That is ok with me - I will add a note to the bug.
>>> 
>>> Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa?
>>> 
>>>> 
>>>> Also: I see a ciMethod::check_call that has a comment - 
>>>> IT appears to fail when applied to an invoke interface call site.
>>>> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. 
>>>> 
>>> 
>>> This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas?
>>> 
>>> igor
>>> 
>>>> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take
>>>> the subtleties of invoke interface and invoke special into account. 
>>>>> 
>>>>> igor
>>>>> 
>>>>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the
>>>>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code,
>>>>>> so that you get the correct behavior depending on the requesting byte code.
>>>>>> 
>>>>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so
>>>>>> I could use help studying this a bit more to understand if this really is resolution or is really selection.
>>>>>> 
>>>>>> thanks,
>>>>>> Karen
>>>>>> 
>>>>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote:
>>>>>>>> Hi Lois,
>>>>>>>> 
>>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests.
>>>>>>>> 
>>>>>>>> igor
>>>>>>> Hi Igor,
>>>>>>> 
>>>>>>> Thanks for waiting on this.  A couple of comments:
>>>>>>> 
>>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private.  So I think moving this exception from runtime to linktime is okay.
>>>>>>> 
>>>>>>> - I'm concerned about the change on line #998, #1030, #1316.  I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method.  For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method.  Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false.
>>>>>>> 
>>>>>>> Just curious did you also run the testbase default methods tests?
>>>>>>> Lois
>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan <lois.foltan at oracle.com <mailto:lois.foltan at oracle.com>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Igor,
>>>>>>>>> 
>>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review.  We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Lois
>>>>>>>>> 
>>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote:
>>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected).
>>>>>>>>>> 
>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 <https://bugs.openjdk.java.net/browse/JDK-8153115>
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/>
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> igor
>>> 
>> 
> 


From john.r.rose at oracle.com  Tue Apr 12 20:09:23 2016
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 12 Apr 2016 13:09:23 -0700
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <570D288D.8020106@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com>
	<570B8F6B.2030206@oracle.com>
	<06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com>
	<570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com>
	<570D288D.8020106@oracle.com>
Message-ID: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com>

On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> On 4/12/16 7:33 PM, Vladimir Kozlov wrote:
>> You did not fix comment:
>> 
>> + // public native Object Unsafe.allocateInstance(Class<?> cls);
>> 
>> should be:
>> 
>> + // private native Object allocateInstance0(Class<?> cls) throws
>> InstantiationException;
> 
> Ok, finally found where it is :-)
> Incorporated (will update the webrev shortly).
> 
>> An other question: does it really throw InstantiationException?
> 
> Yes, it does throw IE from runtime call on slow path for abstract classes & interfaces (they have slow bit set in layout_helper).
> 
> I didn't move the check into Java, because I didn't want to add yet another guard on fast path.

A fix is necessary, but I'm not comfortable with the shape of the checking logic.
The C-coded JNI function (not the intrinsic) just surfaces the function JNIEnv::AllocObject.
This function calls some complicated C++ logic in Klass::check_valid_for_instantiation
to check for various things, including arrays and abstracts.  (There's also a primitive check.)

So the problem is that the JIT intrinsic doesn't mimic all these checks.
And a good tactic is to lift such checks into Java code, since Java is maintainable.
But, there is still a maintenance problem:  The checks in the proposed chance
overlap with, but do not cover, the checks performed by JNIEnv::AllocObject.
Thus, it is difficult to prove that they are correct.  Some additional checks are
performed (in an ad hoc manner) by the JIT intrinsic.

Thus, the checking for a valid class is now in three places:  1. JNIEnv::AllocObject
(when the intrinsic is not used), 2. the new Java code (whether the intrinsic is used
or not), and 3. the partial checks in the intrinsic code (library_call.cpp).

The unit test will prevent regressions, but the code is still messy and hard to work with.

Can we make it better at this point?  Maybe not; maybe this is the least-bad point fix.
But it seems to me that a less-bad fix would put the required logic in two places
rather than three.  Two ways to do that are 1. push the prim and array checks from
Java down into the JIT intrinsic, next to the pre-existing checks, or 2. pull the
pre-existing JIT intrinsic tests up into Java.  Option 2 seems to require a new
intrinsic to capture the pre-existing intrinsic tests.

On the whole, since this Unsafe API point simply exposes JNIEnv::AllocObject,
I suggest doing the necessary work in library_call.cpp to make the intrinsic
accurately reflect that JNI function.  That will make the checks easier to verify
and maintain.  I don't think (AM I REALLY SAYING THIS?) the Java-based checks
help much in this particular case.

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160412/661aa70a/attachment.html>

From christian.thalinger at oracle.com  Tue Apr 12 20:40:14 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 12 Apr 2016 10:40:14 -1000
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com>
	<570B8F6B.2030206@oracle.com>
	<06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com>
	<570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com>
	<570D288D.8020106@oracle.com>
	<5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com>
Message-ID: <4D7414C9-9D41-4939-A7FE-95CDAC9E865C@oracle.com>


> On Apr 12, 2016, at 10:09 AM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>> 
>> On 4/12/16 7:33 PM, Vladimir Kozlov wrote:
>>> You did not fix comment:
>>> 
>>> + // public native Object Unsafe.allocateInstance(Class<?> cls);
>>> 
>>> should be:
>>> 
>>> + // private native Object allocateInstance0(Class<?> cls) throws
>>> InstantiationException;
>> 
>> Ok, finally found where it is :-)
>> Incorporated (will update the webrev shortly).
>> 
>>> An other question: does it really throw InstantiationException?
>> 
>> Yes, it does throw IE from runtime call on slow path for abstract classes & interfaces (they have slow bit set in layout_helper).
>> 
>> I didn't move the check into Java, because I didn't want to add yet another guard on fast path.
> 
> A fix is necessary, but I'm not comfortable with the shape of the checking logic.
> The C-coded JNI function (not the intrinsic) just surfaces the function JNIEnv::AllocObject.
> This function calls some complicated C++ logic in Klass::check_valid_for_instantiation
> to check for various things, including arrays and abstracts.  (There's also a primitive check.)
> 
> So the problem is that the JIT intrinsic doesn't mimic all these checks.
> And a good tactic is to lift such checks into Java code, since Java is maintainable.
> But, there is still a maintenance problem:  The checks in the proposed chance
> overlap with, but do not cover, the checks performed by JNIEnv::AllocObject.
> Thus, it is difficult to prove that they are correct.  Some additional checks are
> performed (in an ad hoc manner) by the JIT intrinsic.
> 
> Thus, the checking for a valid class is now in three places:  1. JNIEnv::AllocObject
> (when the intrinsic is not used), 2. the new Java code (whether the intrinsic is used
> or not), and 3. the partial checks in the intrinsic code (library_call.cpp).
> 
> The unit test will prevent regressions, but the code is still messy and hard to work with.
> 
> Can we make it better at this point?  Maybe not; maybe this is the least-bad point fix.
> But it seems to me that a less-bad fix would put the required logic in two places
> rather than three.  Two ways to do that are 1. push the prim and array checks from
> Java down into the JIT intrinsic, next to the pre-existing checks, or 2. pull the
> pre-existing JIT intrinsic tests up into Java.  Option 2 seems to require a new
> intrinsic to capture the pre-existing intrinsic tests.
> 
> On the whole, since this Unsafe API point simply exposes JNIEnv::AllocObject,
> I suggest doing the necessary work in library_call.cpp to make the intrinsic
> accurately reflect that JNI function.  That will make the checks easier to verify
> and maintain.  I don't think (AM I REALLY SAYING THIS?) the Java-based checks
> help much in this particular case.

-1 (for obvious reasons)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160412/8aabe708/attachment-0001.html>

From michael.c.berg at intel.com  Wed Apr 13 06:26:24 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Wed, 13 Apr 2016 06:26:24 +0000
Subject: CR for RFR 8153998
Message-ID: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>

Hi Folks,

I would like to contribute Programmable SIMD as implemented on multi-versioned post loops.  See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation.
This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops.

This code was tested as follows (see jbs entry below):

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998

webrev:
http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/

Thanks,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160413/bb1a2239/attachment.html>

From tobias.hartmann at oracle.com  Wed Apr 13 08:53:01 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 13 Apr 2016 10:53:01 +0200
Subject: [9] RFR(S): 8154073: Several compiler tests fail when are executed
	with C1 only
Message-ID: <570E08ED.4010207@oracle.com>

Hi,

please review the following patch:

https://bugs.openjdk.java.net/browse/JDK-8154073
http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/

TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests.

TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message.

Tested with RBT (running).

Thanks,
Tobias

From nils.eliasson at oracle.com  Wed Apr 13 12:59:30 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 13 Apr 2016 14:59:30 +0200
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <570D2869.5030206@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
Message-ID: <570E42B2.2090306@oracle.com>

Hi,

New webrev:
http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/

Summary
Introduced an enum CompileReason with members matching all the old 
variants, and a table containing all the unchanged strings. I see the 
possibility of removing/changing/simplifying some CompileReasons but 
have choosen not to do so in this change.

Only new logic is the CompileTask::can_become_stale() method.

Testing:
Running Testset hotspot on all platforms and hotspot_all on one platform

Regards,
Nils Eliawsson

On 2016-04-12 18:55, Vladimir Kozlov wrote:
> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>> Tasks get evicted from the compile_queue if their invocation counter
>> hasn't increased during TieredCompileTaskTimeout.
>> (AdvancedThresholdPolicy::is_stale(...)).
>>
>> I'll do a proper fix, it is the right thing to do and should be pretty
>> quick. I'll change the comment to an enum that represent who submitted
>> the compile, and add a table for the comments. This could be useful in
>> other settings to.
>
> Sounds good.
>
> Thanks,
> Vladimir
>
>>
>> Regards,
>> Nils
>>
>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>> What do you mean "stale"?
>>> I would prefer to see the real fix as you suggested to avoid removing
>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> Please review this small fix of the BlockingCompilation test.
>>>>
>>>> Summary:
>>>> Add method enqueued for compilation with WB API may be removed from
>>>> the compile queue as stale.
>>>>
>>>> Solution:
>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>> stale while the test is running. (Also added some extra
>>>> checks that may spare us from waiting until timeout for failing.)
>>>>
>>>> This is an workaround but we should consider fixing something
>>>> permanent for WB API compiles - like tagging the compile
>>>> task with info about the origin of the compile. The comment field has
>>>> this information - but then it needs to be
>>>> converted to an enum.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>
>>>> Best regards,
>>>> Nils Eliasson
>>>>
>>>>
>>>>
>>>>
>>


From vladimir.x.ivanov at oracle.com  Wed Apr 13 16:01:52 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 13 Apr 2016 19:01:52 +0300
Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error when
	invoking nonexistent method
Message-ID: <570E6D70.40904@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8154172

C1 unconditionally inserts null check before doing a call, even if it 
throws an error during linkage. It contradicts JVMS which requires that 
linking errors precede run-time errors.

The fix is to detect non-resolvable cases and avoid null checks / 
profiling altogether letting the runtime to throw a linkage error.

Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck).

Some clarifications:

    - klass->is_loaded() && !target->is_loaded() is true when method 
resolution fails;

    - static vs non-static checks aren't needed because 
stream()->get_method already returns unloaded method in such case;

Thanks!

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Wed Apr 13 16:47:19 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 13 Apr 2016 19:47:19 +0300
Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance
	doesn't properly filter out array classes
In-Reply-To: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com>
References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com>
	<570B8F6B.2030206@oracle.com>
	<06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com>
	<570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com>
	<570D288D.8020106@oracle.com>
	<5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com>
Message-ID: <570E7817.1040106@oracle.com>

Thanks for the feedback, John.

I see your point.

Actually, after looking at check_valid_for_instantiation more carefully, 
I found one more missing check in the intrinsic: Class instantiation is 
forbidden in InstanceKlass::check_valid_for_instantiation, but the 
intrinsic allows it.

So I agree it would be desirable to minimize duplication in the code.

Am I right that you are in favor of the following approach?

   http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path/

I'll experiment to see how does it shape out in both cases.

Best regards,
Vladimir Ivanov

On 4/12/16 11:09 PM, John Rose wrote:
> On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>>
>> On 4/12/16 7:33 PM, Vladimir Kozlov wrote:
>>> You did not fix comment:
>>>
>>> + // public native Object Unsafe.allocateInstance(Class<?> cls);
>>>
>>> should be:
>>>
>>> + // private native Object allocateInstance0(Class<?> cls) throws
>>> InstantiationException;
>>
>> Ok, finally found where it is :-)
>> Incorporated (will update the webrev shortly).
>>
>>> An other question: does it really throw InstantiationException?
>>
>> Yes, it does throw IE from runtime call on slow path for abstract
>> classes & interfaces (they have slow bit set in layout_helper).
>>
>> I didn't move the check into Java, because I didn't want to add yet
>> another guard on fast path.
>
> A fix is necessary, but I'm not comfortable with the shape of the
> checking logic.
> The C-coded JNI function (not the intrinsic) just surfaces the function
> JNIEnv::AllocObject.
> This function calls some complicated C++ logic
> in Klass::check_valid_for_instantiation
> to check for various things, including arrays and abstracts.  (There's
> also a primitive check.)
>
> So the problem is that the JIT intrinsic doesn't mimic all these checks.
> And a good tactic is to lift such checks into Java code, since Java is
> maintainable.
> But, there is still a maintenance problem:  The checks in the proposed
> chance
> overlap with, but do not cover, the checks performed by JNIEnv::AllocObject.
> Thus, it is difficult to prove that they are correct.  Some additional
> checks are
> performed (in an ad hoc manner) by the JIT intrinsic.
>
> Thus, the checking for a valid class is now in three places:  1.
> JNIEnv::AllocObject
> (when the intrinsic is not used), 2. the new Java code (whether the
> intrinsic is used
> or not), and 3. the partial checks in the intrinsic code (library_call.cpp).
>
> The unit test will prevent regressions, but the code is still messy and
> hard to work with.
>
> Can we make it better at this point?  Maybe not; maybe this is the
> least-bad point fix.
> But it seems to me that a less-bad fix would put the required logic in
> two places
> rather than three.  Two ways to do that are 1. push the prim and array
> checks from
> Java down into the JIT intrinsic, next to the pre-existing checks, or 2.
> pull the
> pre-existing JIT intrinsic tests up into Java.  Option 2 seems to
> require a new
> intrinsic to capture the pre-existing intrinsic tests.
>
> On the whole, since this Unsafe API point simply exposes
> JNIEnv::AllocObject,
> I suggest doing the necessary work in library_call.cpp to make the intrinsic
> accurately reflect that JNI function.  That will make the checks easier
> to verify
> and maintain.  I don't think (AM I REALLY SAYING THIS?) the Java-based
> checks
> help much in this particular case.
>
> ? John

From vladimir.kozlov at oracle.com  Wed Apr 13 21:02:07 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 13 Apr 2016 14:02:07 -0700
Subject: [9] RFR(S): 8154073: Several compiler tests fail when are
	executed with C1 only
In-Reply-To: <570E08ED.4010207@oracle.com>
References: <570E08ED.4010207@oracle.com>
Message-ID: <570EB3CF.5010408@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/13/16 1:53 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch:
>
> https://bugs.openjdk.java.net/browse/JDK-8154073
> http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/
>
> TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests.
>
> TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message.
>
> Tested with RBT (running).
>
> Thanks,
> Tobias
>

From christian.thalinger at oracle.com  Wed Apr 13 21:08:02 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 13 Apr 2016 11:08:02 -1000
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
Message-ID: <E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>


> On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
>  <>Hi Folks,
> 
> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops.  See: https://bugs.openjdk.java.net/browse/JDK-8151573 <https://bugs.openjdk.java.net/browse/JDK-8151573> for the first half of the implementation.
> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops.
>  
> This code was tested as follows (see jbs entry below):
> 
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 <https://bugs.openjdk.java.net/browse/JDK-8153998>
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ <http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/>
+//------------------------------MachMskNode-----------------------------------
+// Machine function Msk Node
+class MachMskNode : public MachIdealNode {
Does ?Msk? mean mask?  Then we should call it MachMaskNode.

Also, I don?t quite understand why we have:
+instruct set_mask(rRegI dst, rRegI src) %{
+  predicate(VM_Version::supports_avx512vl());
+  match(Set dst (MaskCreateI src));
+  effect(TEMP dst);
+  format %{ "createmsk   $dst, $src" %}
+  ins_encode %{
+    __ createmsk($dst$$Register, $src$$Register);
+  %}
but:
+  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const {
+    MacroAssembler _masm(&cbuf);
+    __ restoremsk();
+  }

>  
> Thanks,
> Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160413/07e94b57/attachment.html>

From vladimir.kozlov at oracle.com  Wed Apr 13 21:34:49 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 13 Apr 2016 14:34:49 -0700
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <570E42B2.2090306@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com>
Message-ID: <570EBB79.7060805@oracle.com>

Very nice, I like it.

One note. CompileReason (and its names) should be CompileTask class 
where it is recorded. Then CompileTask::can_become_stale() can be in 
header file so it is inlinined on all platforms.

Thanks,
Vladimir

On 4/13/16 5:59 AM, Nils Eliasson wrote:
> Hi,
>
> New webrev:
> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/
>
> Summary
> Introduced an enum CompileReason with members matching all the old
> variants, and a table containing all the unchanged strings. I see the
> possibility of removing/changing/simplifying some CompileReasons but
> have choosen not to do so in this change.
>
> Only new logic is the CompileTask::can_become_stale() method.
>
> Testing:
> Running Testset hotspot on all platforms and hotspot_all on one platform
>
> Regards,
> Nils Eliawsson
>
> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>> Tasks get evicted from the compile_queue if their invocation counter
>>> hasn't increased during TieredCompileTaskTimeout.
>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>
>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>> quick. I'll change the comment to an enum that represent who submitted
>>> the compile, and add a table for the comments. This could be useful in
>>> other settings to.
>>
>> Sounds good.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Regards,
>>> Nils
>>>
>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>> What do you mean "stale"?
>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>> Hi,
>>>>>
>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>
>>>>> Summary:
>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>> the compile queue as stale.
>>>>>
>>>>> Solution:
>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>> stale while the test is running. (Also added some extra
>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>
>>>>> This is an workaround but we should consider fixing something
>>>>> permanent for WB API compiles - like tagging the compile
>>>>> task with info about the origin of the compile. The comment field has
>>>>> this information - but then it needs to be
>>>>> converted to an enum.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>
>>>>> Best regards,
>>>>> Nils Eliasson
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>

From michael.c.berg at intel.com  Wed Apr 13 21:35:39 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Wed, 13 Apr 2016 21:35:39 +0000
Subject: CR for RFR 8153998
In-Reply-To: <E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>

See below for context.

Regards,
Michael

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Wednesday, April 13, 2016 2:08 PM
To: Berg, Michael C <michael.c.berg at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: CR for RFR 8153998


On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

Hi Folks,

I would like to contribute Programmable SIMD as implemented on multi-versioned post loops.  See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation.
This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops.

This code was tested as follows (see jbs entry below):

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998

webrev:
http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/


+//------------------------------MachMskNode-----------------------------------

+// Machine function Msk Node

+class MachMskNode : public MachIdealNode {
Does ?Msk? mean mask?  Then we should call it MachMaskNode.

<MCB> Ok, that?s easy enough.

Also, I don?t quite understand why we have:

+instruct set_mask(rRegI dst, rRegI src) %{

+  predicate(VM_Version::supports_avx512vl());

+  match(Set dst (MaskCreateI src));

+  effect(TEMP dst);

+  format %{ "createmsk   $dst, $src" %}

+  ins_encode %{

+    __ createmsk($dst$$Register, $src$$Register);

+  %}
but:

+  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const {

+    MacroAssembler _masm(&cbuf);

+    __ restoremsk();

+  }

The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.

Thanks,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160413/e1d62203/attachment-0001.html>

From vladimir.kozlov at oracle.com  Wed Apr 13 21:52:36 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 13 Apr 2016 14:52:36 -0700
Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error
	when invoking nonexistent method
In-Reply-To: <570E6D70.40904@oracle.com>
References: <570E6D70.40904@oracle.com>
Message-ID: <570EBFA4.4060005@oracle.com>

Looks good to me. ciEnv.cpp cahnges looks empty. If it is only spacing 
changes we don't need to include them into bug fix.

thanks,
Vladimir K

On 4/13/16 9:01 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8154172
>
> C1 unconditionally inserts null check before doing a call, even if it
> throws an error during linkage. It contradicts JVMS which requires that
> linking errors precede run-time errors.
>
> The fix is to detect non-resolvable cases and avoid null checks /
> profiling altogether letting the runtime to throw a linkage error.
>
> Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck).
>
> Some clarifications:
>
>     - klass->is_loaded() && !target->is_loaded() is true when method
> resolution fails;
>
>     - static vs non-static checks aren't needed because
> stream()->get_method already returns unloaded method in such case;
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Thu Apr 14 00:40:44 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 13 Apr 2016 17:40:44 -0700
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
Message-ID: <570EE70C.8000906@oracle.com>

Hi Michael,

Please, split changes. _rex_vex_w_reverted (and other assembler) changes 
can be pushed first. evmovdqul -> evmovdquq and Vectors element_size() 
changes could be pushed separately too.

You don't need MachMskNode place holder methods in other platforms .ad. 
I think Matcher::has_predicated_vectors() will be enough since 
MachMskNode is generated only when has_predicated_vectors() is true. 
This is how we usually do.

macroAssembler_x86.cpp

Why you use table and not instructions to generate mask value? Looking 
on table it very easy to generate (you would need additional instruction 
but it is better than load from memory I think):

(1 << src) - 1

src == 0 could be treated specially.
You can leave the table as comment to see which values are expected.

x86.ad

names should be consistent: MaskCreateINode -> CreateMaskINode, set_mask 
-> createMask. You can also use Matcher::has_predicated_vectors() in 
predicate:

+instruct createMask(rRegI dst, rRegI src) %{
+  predicate(Matcher::has_predicated_vectors());
+  match(Set dst (CreateMaskI src));
+  effect(TEMP dst);
+  format %{ "createmsk   $dst, $src" %}

May be it should setMask as reverse to restoreMask. And more precisely 
setvectmask/restorevectmask.


MaskCreateINode or SetVectMaskINode should be defined in vector.hpp and 
not in subnode.hpp.

block.cpp
Matcher::has_predicated_vectors() should be checked with if 
(found_fixup_loops) to avoid useless looping.

I don't like how you inject MachMskNode. It should be generated on exit 
from loop where you created MaskCreateINode.

Will need additional review after you clean up above comments.

Thanks,
Vladimir

On 4/12/16 11:26 PM, Berg, Michael C wrote:
> Hi Folks,
>
> I would like to contribute Programmable SIMD as implemented on
> multi-versioned post loops. See:
> https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of
> the implementation.
>
> This component delivers mask programmed post loops which execute in a
> single iteration in place of fixup scalar loops which used to take many
> iterations to complete work for user loops.
>
> Currently I have enabled this optimization for x86 only, specifically
> for machines with masked data predication implemented as per fully
> enabled EVEX targets.  It delivers up to 2x performance and has been
> modeled over a large number of loop lengths and forms of loops.
>
> This code was tested as follows(see jbs entry below):
>
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998
>
>
> webrev:
>
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>
> Thanks,
>
> Michael
>

From tobias.hartmann at oracle.com  Thu Apr 14 06:31:10 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 14 Apr 2016 08:31:10 +0200
Subject: [9] RFR(S): 8154073: Several compiler tests fail when are
	executed with C1 only
In-Reply-To: <570EB3CF.5010408@oracle.com>
References: <570E08ED.4010207@oracle.com> <570EB3CF.5010408@oracle.com>
Message-ID: <570F392E.40608@oracle.com>

Thanks, Vladimir!

Best regards,
Tobias

On 13.04.2016 23:02, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 4/13/16 1:53 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8154073
>> http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/
>>
>> TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests.
>>
>> TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message.
>>
>> Tested with RBT (running).
>>
>> Thanks,
>> Tobias
>>

From rwestrel at redhat.com  Thu Apr 14 06:46:49 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 14 Apr 2016 08:46:49 +0200
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
Message-ID: <dk61t683cx2.fsf@rwestrel.remote.csb>


When running scimark on aarch64:

 ;; B16: #	B17 <- B21  top-of-loop Freq: 2305.21

  0x000003ffa126f710: add	w17, w11, w12   ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - jnt.scimark2.FFT::transform_internal at 243 (line 129)

  0x000003ffa126f714: nop
  0x000003ffa126f718: nop
  0x000003ffa126f71c: nop                       ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0}
                                                ; - jnt.scimark2.FFT::transform_internal at 238 (line 129)

 ;; B17: #	B32 B18 <- B25 B16 	Loop: B17-B16 inner  Freq: 3056.06

  0x000003ffa126f720: lsl	w16, w17, #1    ;*imul {reexecute=0 rethrow=0 return_oop=0}
                                                ; - jnt.scimark2.FFT::transform_internal at 244 (line 129)

The 3 nops are added by the code that aligns loop entries: the top of
loop block is first encountered and its alignment is set, the loop head
is later encountered through the backbranch of an outer loop and its
alignment is set.

I propose that the code that aligns loop entries verifies that a loop
top doesn't exist before it sets the alignment:

http://cr.openjdk.java.net/~roland/8154135/webrev.00/

Roland.

From igor.veresov at oracle.com  Thu Apr 14 07:01:28 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 14 Apr 2016 00:01:28 -0700
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <570E42B2.2090306@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com>
Message-ID: <BA04AE25-E6E2-4510-B0C0-F11579F37A16@oracle.com>

Sorry for nitpicking, but can?t compile_reason argument be of type CompileReason instead of int everywhere? It?d be also nice to place reason_name close to the enum.

igor


> On Apr 13, 2016, at 5:59 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> Hi,
> 
> New webrev:
> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/
> 
> Summary
> Introduced an enum CompileReason with members matching all the old variants, and a table containing all the unchanged strings. I see the possibility of removing/changing/simplifying some CompileReasons but have choosen not to do so in this change.
> 
> Only new logic is the CompileTask::can_become_stale() method.
> 
> Testing:
> Running Testset hotspot on all platforms and hotspot_all on one platform
> 
> Regards,
> Nils Eliawsson
> 
> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>> Tasks get evicted from the compile_queue if their invocation counter
>>> hasn't increased during TieredCompileTaskTimeout.
>>> (AdvancedThresholdPolicy::is_stale(...)).
>>> 
>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>> quick. I'll change the comment to an enum that represent who submitted
>>> the compile, and add a table for the comments. This could be useful in
>>> other settings to.
>> 
>> Sounds good.
>> 
>> Thanks,
>> Vladimir
>> 
>>> 
>>> Regards,
>>> Nils
>>> 
>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>> What do you mean "stale"?
>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>> Hi,
>>>>> 
>>>>> Please review this small fix of the BlockingCompilation test.
>>>>> 
>>>>> Summary:
>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>> the compile queue as stale.
>>>>> 
>>>>> Solution:
>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>> stale while the test is running. (Also added some extra
>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>> 
>>>>> This is an workaround but we should consider fixing something
>>>>> permanent for WB API compiles - like tagging the compile
>>>>> task with info about the origin of the compile. The comment field has
>>>>> this information - but then it needs to be
>>>>> converted to an enum.
>>>>> 
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>> 
>>>>> Best regards,
>>>>> Nils Eliasson
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
> 


From zoltan.majo at oracle.com  Thu Apr 14 11:47:28 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 14 Apr 2016 13:47:28 +0200
Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the
	end of the heap
Message-ID: <570F8350.5080209@oracle.com>

Hi,


please review the patch for 8151708.

https://bugs.openjdk.java.net/browse/JDK-8151708

Problem: On solaris_sparc, the VM can set the TLAB's top pointer to a 
value past the end of the Java heap. The problem appears with large 
values of MinTLABSize.The reason for the problem is that the 'brcs' 
instruction at

http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3260
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3265

checks the condition codes in 'icc' (32-bit), but not in 'xcc' (64-bit).

Solution: As the VM is handling addresses at the above-mentioned 
locations, the appropriate condition codes are supposed to be checked. 
Use 'BPcc' instead of 'Bicc' at these locations.

Webrev:
http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/

Testing:
- JPRT
- reproducer on solaris_sparc.

Thank you!

Best regards,


Zoltan


From tobias.hartmann at oracle.com  Thu Apr 14 12:15:19 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 14 Apr 2016 14:15:19 +0200
Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past
	the end of the heap
In-Reply-To: <570F8350.5080209@oracle.com>
References: <570F8350.5080209@oracle.com>
Message-ID: <570F89D7.7080809@oracle.com>

Hi Zoltan,

On 14.04.2016 13:47, Zolt?n Maj? wrote:
> Hi,
> 
> 
> please review the patch for 8151708.
> 
> https://bugs.openjdk.java.net/browse/JDK-8151708
> 
> Problem: On solaris_sparc, the VM can set the TLAB's top pointer to a value past the end of the Java heap. The problem appears with large values of MinTLABSize.The reason for the problem is that the 'brcs' instruction at
> 
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3260
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3265
> 
> checks the condition codes in 'icc' (32-bit), but not in 'xcc' (64-bit).

I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture.

Best regards,
Tobias

> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations.
> 
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/
> 
> Testing:
> - JPRT
> - reproducer on solaris_sparc.
> 
> Thank you!
> 
> Best regards,
> 
> 
> Zoltan
> 

From zoltan.majo at oracle.com  Thu Apr 14 12:26:55 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 14 Apr 2016 14:26:55 +0200
Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past
	the end of the heap
In-Reply-To: <570F89D7.7080809@oracle.com>
References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com>
Message-ID: <570F8C8F.4080003@oracle.com>

Hi Tobias,


thank you for the feedback!

On 04/14/2016 02:15 PM, Tobias Hartmann wrote:
> [...]
> I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture.

Yes, that simplifies the code a bit. Here is the updated webrev:
http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/

Tests are running.

Thank you!

Best regards,


Zoltan

>
> Best regards,
> Tobias
>
>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/
>>
>> Testing:
>> - JPRT
>> - reproducer on solaris_sparc.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>


From nils.eliasson at oracle.com  Thu Apr 14 12:32:47 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 14 Apr 2016 14:32:47 +0200
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <BA04AE25-E6E2-4510-B0C0-F11579F37A16@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com>
	<BA04AE25-E6E2-4510-B0C0-F11579F37A16@oracle.com>
Message-ID: <570F8DEF.7000504@oracle.com>

Yes, good feedback -

New webrev including your and Vladimirs suggestions:
http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/

Thanks for having a look!
Nils


On 2016-04-14 09:01, Igor Veresov wrote:
> Sorry for nitpicking, but can?t compile_reason argument be of type CompileReason instead of int everywhere? It?d be also nice to place reason_name close to the enum.
>
> igor
>
>
>> On Apr 13, 2016, at 5:59 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
>>
>> Hi,
>>
>> New webrev:
>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/
>>
>> Summary
>> Introduced an enum CompileReason with members matching all the old variants, and a table containing all the unchanged strings. I see the possibility of removing/changing/simplifying some CompileReasons but have choosen not to do so in this change.
>>
>> Only new logic is the CompileTask::can_become_stale() method.
>>
>> Testing:
>> Running Testset hotspot on all platforms and hotspot_all on one platform
>>
>> Regards,
>> Nils Eliawsson
>>
>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>> hasn't increased during TieredCompileTaskTimeout.
>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>
>>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>>> quick. I'll change the comment to an enum that represent who submitted
>>>> the compile, and add a table for the comments. This could be useful in
>>>> other settings to.
>>> Sounds good.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>> Regards,
>>>> Nils
>>>>
>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>> What do you mean "stale"?
>>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>
>>>>>> Summary:
>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>> the compile queue as stale.
>>>>>>
>>>>>> Solution:
>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>> stale while the test is running. (Also added some extra
>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>
>>>>>> This is an workaround but we should consider fixing something
>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>> task with info about the origin of the compile. The comment field has
>>>>>> this information - but then it needs to be
>>>>>> converted to an enum.
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>
>>>>>> Best regards,
>>>>>> Nils Eliasson
>>>>>>
>>>>>>
>>>>>>
>>>>>>


From tobias.hartmann at oracle.com  Thu Apr 14 12:33:48 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 14 Apr 2016 14:33:48 +0200
Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past
	the end of the heap
In-Reply-To: <570F8C8F.4080003@oracle.com>
References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com>
	<570F8C8F.4080003@oracle.com>
Message-ID: <570F8E2C.20605@oracle.com>

Hi Zoltan,

On 14.04.2016 14:26, Zolt?n Maj? wrote:
> Hi Tobias,
> 
> 
> thank you for the feedback!
> 
> On 04/14/2016 02:15 PM, Tobias Hartmann wrote:
>> [...]
>> I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture.
> 
> Yes, that simplifies the code a bit. Here is the updated webrev:
> http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/

Looks good!

Best regards,
Tobias

> 
> Tests are running.
> 
> Thank you!
> 
> Best regards,
> 
> 
> Zoltan
> 
>>
>> Best regards,
>> Tobias
>>
>>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/
>>>
>>> Testing:
>>> - JPRT
>>> - reproducer on solaris_sparc.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
> 

From nils.eliasson at oracle.com  Thu Apr 14 12:43:06 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 14 Apr 2016 14:43:06 +0200
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <570EBB79.7060805@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
Message-ID: <570F905A.4050202@oracle.com>

I moved the reasons to CompileTask.hpp and put it together with the 
names list. Also changed the type from int to CompileReason as Igor 
suggested.

It gets verbose in the method declarations in compileBroker and 
sometimes I think CompileReason should be declared in CompileBroker 
because it is mostly used there. On the other hand, CompileTask is the 
keeper of the CompileReason so it makes sense too.

New webrev:
http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/

Thanks!
Nils

On 2016-04-13 23:34, Vladimir Kozlov wrote:
> Very nice, I like it.
>
> One note. CompileReason (and its names) should be CompileTask class 
> where it is recorded. Then CompileTask::can_become_stale() can be in 
> header file so it is inlinined on all platforms.
>
> Thanks,
> Vladimir
>
> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>> Hi,
>>
>> New webrev:
>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/
>>
>> Summary
>> Introduced an enum CompileReason with members matching all the old
>> variants, and a table containing all the unchanged strings. I see the
>> possibility of removing/changing/simplifying some CompileReasons but
>> have choosen not to do so in this change.
>>
>> Only new logic is the CompileTask::can_become_stale() method.
>>
>> Testing:
>> Running Testset hotspot on all platforms and hotspot_all on one platform
>>
>> Regards,
>> Nils Eliawsson
>>
>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>> hasn't increased during TieredCompileTaskTimeout.
>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>
>>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>>> quick. I'll change the comment to an enum that represent who submitted
>>>> the compile, and add a table for the comments. This could be useful in
>>>> other settings to.
>>>
>>> Sounds good.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Nils
>>>>
>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>> What do you mean "stale"?
>>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>
>>>>>> Summary:
>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>> the compile queue as stale.
>>>>>>
>>>>>> Solution:
>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>> stale while the test is running. (Also added some extra
>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>
>>>>>> This is an workaround but we should consider fixing something
>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>> task with info about the origin of the compile. The comment field 
>>>>>> has
>>>>>> this information - but then it needs to be
>>>>>> converted to an enum.
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>
>>>>>> Best regards,
>>>>>> Nils Eliasson
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>


From nils.eliasson at oracle.com  Thu Apr 14 13:17:47 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 14 Apr 2016 15:17:47 +0200
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
Message-ID: <570F987B.2070202@oracle.com>

Hi,

Please review this fix.

Summary:
In JDK-8150646 I added an assert in compile_method that the compiler 
must not be NULL. Before there was a return there that just ignored the 
compile.

Running the VM with the flag combination -Xcomp and 
-XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is 
set to false (but the interpreter it is still available) and then some 
essential methods are forced to be compiled, but the initial complevel 
becomes 0 and hits the assert in compileBroker.

Solution:
We could discuss if it should be allowed to submit compiles on level 0, 
a change that would become a bit larger. This time I choose to extend 
the _initalized check in compile_method. I didn't add any logging or 
warning because this is really a corner case.

Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
(Ignore the extra tags in the webrev)

Best regards,
Nils Eliasson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160414/637c94b3/attachment.html>

From zoltan.majo at oracle.com  Thu Apr 14 13:21:00 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 14 Apr 2016 15:21:00 +0200
Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating
	phi with unique input
Message-ID: <570F993C.3040509@oracle.com>

Hi,


please review the patch for 8153357.

https://bugs.openjdk.java.net/browse/JDK-8153357

Problem: When determining the unique input of a phi, the C2 compiler 
removes cast nodes connecting the phi to its unique input.
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181

Then (if the phi has indeed a unique input), the C2 compiler attempts 
replace the phi with a cast node. The new cast node feeds from the 
unique input.

To be able to remove the phi node, the C2 compiler must  to determine 
the type of cast to add in place of the phi node (CastII, CastPP, or 
CheckCastPP).
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705

The failure in the bug report appears because the C2 compiler adds a 
cast node of unexpected type to the graph (a CheckCastPP instead of a 
CastPP when casting between two klass pointers).

Please find more details about the cause of the failure in the bug 
description:
https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108


Solution: Refine C2's logic to determine the type of cast node added.

Webrev:
http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/

Testing:
- JPRT;
- all hotspot compiler tests with RBT (-Xmixed, -Xcomp);
- 500 non-failing runs with the reproducer (the problem reproduces with 
< 100 runs).

Thank you and best regards,


Zoltan


From zoltan.majo at oracle.com  Thu Apr 14 13:24:41 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 14 Apr 2016 15:24:41 +0200
Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past
	the end of the heap
In-Reply-To: <570F8E2C.20605@oracle.com>
References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com>
	<570F8C8F.4080003@oracle.com> <570F8E2C.20605@oracle.com>
Message-ID: <570F9A19.4090500@oracle.com>

Hi Tobias,


On 04/14/2016 02:33 PM, Tobias Hartmann wrote:
> [...]
> Looks good!

Thank you!

For the record: Testing with the reproducer was successful.

Best regards,


Zoltan

>
> Best regards,
> Tobias
>
>> Tests are running.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>> Best regards,
>>> Tobias
>>>
>>>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/
>>>>
>>>> Testing:
>>>> - JPRT
>>>> - reproducer on solaris_sparc.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>


From anton.ivanov at oracle.com  Thu Apr 14 13:30:48 2016
From: anton.ivanov at oracle.com (Anton Ivanov)
Date: Thu, 14 Apr 2016 16:30:48 +0300
Subject: RFR(XS): 8154174: improve JitTester performance
Message-ID: <570F9B88.2040102@oracle.com>

Hi,
Please review small patch that improves JitTester performance

In current implementation JitTester has exception based logic, which is 
not good by itself, but changing this is quite expensive and there is 
simple way to decrease exception overhead - turn off stack trace in 
ProductionFailedException constructor ( this exception is created very 
often and stack trace is never need, as it only used to control program 
flow )
Also small improvement was done in code that does deep copy of 
SymbolTable element ( Map iteration was rewritten to get rid of multiple 
redundant Map.get() which cost 0(1) only in average case and could be 
worse potentially )

Testing: local

webrev: http://cr.openjdk.java.net/~aaivanov/8154174/webrev
bug: https://bugs.openjdk.java.net/browse/JDK-8154174

-- 
Best regards,
Anton Ivanov


From michael.c.berg at intel.com  Thu Apr 14 15:23:31 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 14 Apr 2016 15:23:31 +0000
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E626CA@FMSMSX102.amr.corp.intel.com>

The code has been updated with the change from below:

webrev:
http://cr.openjdk.java.net/~mcberg/8153998/webrev.02a/

Regards,
Michael

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
Sent: Wednesday, April 13, 2016 2:36 PM
To: Christian Thalinger <christian.thalinger at oracle.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: RE: CR for RFR 8153998

See below for context.

Regards,
Michael

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Wednesday, April 13, 2016 2:08 PM
To: Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>>
Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: CR for RFR 8153998


On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

Hi Folks,

I would like to contribute Programmable SIMD as implemented on multi-versioned post loops.  See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation.
This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops.

This code was tested as follows (see jbs entry below):

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998

webrev:
http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/


+//------------------------------MachMskNode-----------------------------------

+// Machine function Msk Node

+class MachMskNode : public MachIdealNode {
Does ?Msk? mean mask?  Then we should call it MachMaskNode.

<MCB> Ok, that?s easy enough.

Also, I don?t quite understand why we have:

+instruct set_mask(rRegI dst, rRegI src) %{

+  predicate(VM_Version::supports_avx512vl());

+  match(Set dst (MaskCreateI src));

+  effect(TEMP dst);

+  format %{ "createmsk   $dst, $src" %}

+  ins_encode %{

+    __ createmsk($dst$$Register, $src$$Register);

+  %}
but:

+  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const {

+    MacroAssembler _masm(&cbuf);

+    __ restoremsk();

+  }

The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.

Thanks,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160414/475e79e4/attachment-0001.html>

From vladimir.x.ivanov at oracle.com  Thu Apr 14 16:53:22 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 14 Apr 2016 19:53:22 +0300
Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe
	accesses
Message-ID: <570FCB02.6000507@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8134918

Type speculation can produce mismatched unsafe accesses.

It injects a guard based on profile data and then propagate type info 
down to the users. If there's an unsafe access, it can become mismatched 
w.r.t. profile data being used.

It happens even for valid usages. If an unsafe access always matches 
memory location at runtime, the code produced by type speculation in 
that case is effectively dead.

What cause problems are unsafe OOP accesses (U.putObject()/getObject() 
on non-OOP locations).

The fix is to avoid intrinsification of problematic accesses. Type 
speculation injects precise type information, which is available during 
intrinsification.

We could try to support mismatched unsafe object accesses instead, but I 
don't see any value in that.

Testing: JPRT, pit-hs-comp (in progress).

Thanks!

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Thu Apr 14 16:54:09 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 14 Apr 2016 19:54:09 +0300
Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error
	when invoking nonexistent method
In-Reply-To: <570EBFA4.4060005@oracle.com>
References: <570E6D70.40904@oracle.com> <570EBFA4.4060005@oracle.com>
Message-ID: <570FCB31.5080709@oracle.com>

Thanks, Vladimir.

Best regards,
Vladimir Ivanov

On 4/14/16 12:52 AM, Vladimir Kozlov wrote:
> Looks good to me. ciEnv.cpp cahnges looks empty. If it is only spacing
> changes we don't need to include them into bug fix.
>
> thanks,
> Vladimir K
>
> On 4/13/16 9:01 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8154172
>>
>> C1 unconditionally inserts null check before doing a call, even if it
>> throws an error during linkage. It contradicts JVMS which requires that
>> linking errors precede run-time errors.
>>
>> The fix is to detect non-resolvable cases and avoid null checks /
>> profiling altogether letting the runtime to throw a linkage error.
>>
>> Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck).
>>
>> Some clarifications:
>>
>>     - klass->is_loaded() && !target->is_loaded() is true when method
>> resolution fails;
>>
>>     - static vs non-static checks aren't needed because
>> stream()->get_method already returns unloaded method in such case;
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov

From christian.thalinger at oracle.com  Mon Apr 11 17:59:42 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 11 Apr 2016 07:59:42 -1000
Subject: enhancement of cmpxchg and copy_to_survivor for ppc64
In-Reply-To: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com>
References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com>
Message-ID: <D32C2773-507F-4594-9B53-F2661E744C04@oracle.com>

[This should be on hotspot-runtime-dev.  BCC?ing hotspot-compiler-dev.]

> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii <HORII at jp.ibm.com> wrote:
> 
> Dear all:
> 
> Can I please request reviews for the following change?
> This change was created for JDK 9 and ppc64.
> 
> Description:
> This change adds options of compare-and-exchange for POWER architecture.
> As described in atomic_linux_ppc.inline.hpp, the current implementation of
> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
> general purposes because twice calls of sync before and after cmpxchg will
> keep consistency. However, they sometimes cause overheads because
> sync instructions are very expensive in the current POWER chip design.
> With this change, callers can explicitly specify to run fence and acquire with
> two additional bool parameters. Because their default values are "true",
> it is not necessary to modify existing cmpxchg calls. 
> 
> In addition, with the new parameters of cmpxchg, this change improves
> performance of copy_to_survivor in the parallel GC. 
> copy_to_survivor changes forward pointers by using cmpxchg. This 
> operation doesn't require any sync instructions, in my understanding. 
> A pointer is changed at most once in a GC and when cmpxchg fails, 
> the latest pointer is available for the caller.
> 
> When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly
> doesn't support new version format of Java 9), pause time of young GC was
> reduced from 10% to 20%.
> 
> Summary of source code changes:
> 
> * src/share/vm/runtime/atomic.hpp
> * src/share/vm/runtime/atomic.cpp
> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>        - Add two arguments of fence and acquire to cmpxchg only for PPC64.
>          Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches,
>          they are reduced while inlining to callers.
> 
> * src/share/vm/oops/oop.inline.hpp
>       - Changed cas_set_mark to call cmpxchg without fence and acquire.
>          cas_set_mark is called only by cas_forward_to that is called only by
>          copy_to_survivor_space and oop_promotion_failed in 
>          psPromotionManager.
> 
> Code change:
> 
>    Please see an attached diff file that was generated with "hg diff -g" 
>    under the latest hotspot directory.
> 
> Passed test:
>     SPECjbb2013 (customized)
> 
> * I believe some other cmpxchg will be optimized by reducing fence 
>   or acquire because twice calls of sync are too conservative to implement
>   Java memory model.
> 
> 
> 
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160411/27ec3215/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64_cmpxchg_opt.diff
Type: application/octet-stream
Size: 8837 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160411/27ec3215/ppc64_cmpxchg_opt.diff>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160411/27ec3215/attachment-0001.html>

From christian.thalinger at oracle.com  Thu Apr 14 18:15:53 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 14 Apr 2016 08:15:53 -1000
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
In-Reply-To: <dk61t683cx2.fsf@rwestrel.remote.csb>
References: <dk61t683cx2.fsf@rwestrel.remote.csb>
Message-ID: <05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com>


> On Apr 13, 2016, at 8:46 PM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> 
> When running scimark on aarch64:
> 
> ;; B16: #	B17 <- B21  top-of-loop Freq: 2305.21
> 
>  0x000003ffa126f710: add	w17, w11, w12   ;*iadd {reexecute=0 rethrow=0 return_oop=0}
>                                                ; - jnt.scimark2.FFT::transform_internal at 243 (line 129)
> 
>  0x000003ffa126f714: nop
>  0x000003ffa126f718: nop
>  0x000003ffa126f71c: nop                       ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0}
>                                                ; - jnt.scimark2.FFT::transform_internal at 238 (line 129)
> 
> ;; B17: #	B32 B18 <- B25 B16 	Loop: B17-B16 inner  Freq: 3056.06
> 
>  0x000003ffa126f720: lsl	w16, w17, #1    ;*imul {reexecute=0 rethrow=0 return_oop=0}
>                                                ; - jnt.scimark2.FFT::transform_internal at 244 (line 129)
> 
> The 3 nops are added by the code that aligns loop entries: the top of
> loop block is first encountered and its alignment is set, the loop head
> is later encountered through the backbranch of an outer loop and its
> alignment is set.
> 
> I propose that the code that aligns loop entries verifies that a loop
> top doesn't exist before it sets the alignment:
> 
> http://cr.openjdk.java.net/~roland/8154135/webrev.00/

I wonder if this has any performance implications (good or bad).  This alignment is not aarch64 specific so we were doing it all the time.

> 
> Roland.


From christian.thalinger at oracle.com  Thu Apr 14 18:19:46 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 14 Apr 2016 08:19:46 -1000
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
Message-ID: <B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>


> On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
> See below for context.
>  
> Regards,
> Michael
>  
> From: Christian Thalinger [mailto:christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>] 
> Sent: Wednesday, April 13, 2016 2:08 PM
> To: Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
> Cc: hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: CR for RFR 8153998
>  
>  
> On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>  
>  <>Hi Folks,
> 
> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops.  See: https://bugs.openjdk.java.net/browse/JDK-8151573 <https://bugs.openjdk.java.net/browse/JDK-8151573> for the first half of the implementation.
> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops.
>  
> This code was tested as follows (see jbs entry below):
> 
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 <https://bugs.openjdk.java.net/browse/JDK-8153998>
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ <http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/>
>  
> +//------------------------------MachMskNode-----------------------------------
> +// Machine function Msk Node
> +class MachMskNode : public MachIdealNode {
> Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>  
> <MCB> Ok, that?s easy enough.
>  
> Also, I don?t quite understand why we have:
> +instruct set_mask(rRegI dst, rRegI src) %{
> +  predicate(VM_Version::supports_avx512vl());
> +  match(Set dst (MaskCreateI src));
> +  effect(TEMP dst);
> +  format %{ "createmsk   $dst, $src" %}
> +  ins_encode %{
> +    __ createmsk($dst$$Register, $src$$Register);
> +  %}
> but:
> +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const {
> +    MacroAssembler _masm(&cbuf);
> +    __ restoremsk();
> +  }
>  
> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
> The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.  
> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.

Hmm.  So, there is no way we can have a RestoreMaskINode?
>  
> Thanks,
> Michael
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160414/7f71b5bd/attachment.html>

From michael.c.berg at intel.com  Thu Apr 14 18:44:06 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 14 Apr 2016 18:44:06 +0000
Subject: CR for RFR 8153998
In-Reply-To: <B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>

Christian,

There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.

Regards,
Michael


From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Thursday, April 14, 2016 11:20 AM
To: Berg, Michael C <michael.c.berg at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: CR for RFR 8153998


On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

See below for context.

Regards,
Michael

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Wednesday, April 13, 2016 2:08 PM
To: Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>>
Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: CR for RFR 8153998


On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

Hi Folks,

I would like to contribute Programmable SIMD as implemented on multi-versioned post loops.  See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation.
This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops.

This code was tested as follows (see jbs entry below):

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998

webrev:
http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/


+//------------------------------MachMskNode-----------------------------------

+// Machine function Msk Node

+class MachMskNode : public MachIdealNode {
Does ?Msk? mean mask?  Then we should call it MachMaskNode.

<MCB> Ok, that?s easy enough.

Also, I don?t quite understand why we have:

+instruct set_mask(rRegI dst, rRegI src) %{

+  predicate(VM_Version::supports_avx512vl());

+  match(Set dst (MaskCreateI src));

+  effect(TEMP dst);

+  format %{ "createmsk   $dst, $src" %}

+  ins_encode %{

+    __ createmsk($dst$$Register, $src$$Register);

+  %}
but:

+  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const {

+    MacroAssembler _masm(&cbuf);

+    __ restoremsk();

+  }

The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.

Hmm.  So, there is no way we can have a RestoreMaskINode?

Thanks,
Michael


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160414/1ca30ba2/attachment-0001.html>

From christian.thalinger at oracle.com  Thu Apr 14 18:45:59 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 14 Apr 2016 08:45:59 -1000
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <570F905A.4050202@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
Message-ID: <CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>


> On Apr 14, 2016, at 2:43 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested.
> 
> It gets verbose in the method declarations in compileBroker

Don?t worry about this.

> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too.

Yes, that?s the right place.

> 
> New webrev:
> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/

+   bool         can_become_stale() const          {
+     return !_is_blocking && (_compile_reason < Reason_Whitebox);
+   }
I?m not a fan of implicit contracts just defined by comments.  This method doesn?t seem to be performance critical so I would suggest to use a switch-case.  An attribute on the enum would be much better but we all know this isn?t Java.

> 
> Thanks!
> Nils
> 
> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>> Very nice, I like it.
>> 
>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms.
>> 
>> Thanks,
>> Vladimir
>> 
>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>> Hi,
>>> 
>>> New webrev:
>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/
>>> 
>>> Summary
>>> Introduced an enum CompileReason with members matching all the old
>>> variants, and a table containing all the unchanged strings. I see the
>>> possibility of removing/changing/simplifying some CompileReasons but
>>> have choosen not to do so in this change.
>>> 
>>> Only new logic is the CompileTask::can_become_stale() method.
>>> 
>>> Testing:
>>> Running Testset hotspot on all platforms and hotspot_all on one platform
>>> 
>>> Regards,
>>> Nils Eliawsson
>>> 
>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>> 
>>>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>>>> quick. I'll change the comment to an enum that represent who submitted
>>>>> the compile, and add a table for the comments. This could be useful in
>>>>> other settings to.
>>>> 
>>>> Sounds good.
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>>> 
>>>>> Regards,
>>>>> Nils
>>>>> 
>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>> What do you mean "stale"?
>>>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>> 
>>>>>>> Summary:
>>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>>> the compile queue as stale.
>>>>>>> 
>>>>>>> Solution:
>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>> stale while the test is running. (Also added some extra
>>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>> 
>>>>>>> This is an workaround but we should consider fixing something
>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>> task with info about the origin of the compile. The comment field has
>>>>>>> this information - but then it needs to be
>>>>>>> converted to an enum.
>>>>>>> 
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Nils Eliasson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160414/225d67b6/attachment.html>

From vladimir.kozlov at oracle.com  Thu Apr 14 18:57:02 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 11:57:02 -0700
Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past
	the end of the heap
In-Reply-To: <570F8C8F.4080003@oracle.com>
References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com>
	<570F8C8F.4080003@oracle.com>
Message-ID: <570FE7FE.2000001@oracle.com>

Good.

Thanks,
Vladimir

On 4/14/16 5:26 AM, Zolt?n Maj? wrote:
> Hi Tobias,
>
>
> thank you for the feedback!
>
> On 04/14/2016 02:15 PM, Tobias Hartmann wrote:
>> [...]
>> I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture.
>
> Yes, that simplifies the code a bit. Here is the updated webrev:
> http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/
>
> Tests are running.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>>
>> Best regards,
>> Tobias
>>
>>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/
>>>
>>> Testing:
>>> - JPRT
>>> - reproducer on solaris_sparc.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>

From vladimir.kozlov at oracle.com  Thu Apr 14 19:02:12 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 12:02:12 -0700
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <570F905A.4050202@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
Message-ID: <570FE934.5000800@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/14/16 5:43 AM, Nils Eliasson wrote:
> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested.
>
> It gets verbose in the method declarations in compileBroker and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is
> the keeper of the CompileReason so it makes sense too.
>
> New webrev:
> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/
>
> Thanks!
> Nils
>
> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>> Very nice, I like it.
>>
>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>> Hi,
>>>
>>> New webrev:
>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/
>>>
>>> Summary
>>> Introduced an enum CompileReason with members matching all the old
>>> variants, and a table containing all the unchanged strings. I see the
>>> possibility of removing/changing/simplifying some CompileReasons but
>>> have choosen not to do so in this change.
>>>
>>> Only new logic is the CompileTask::can_become_stale() method.
>>>
>>> Testing:
>>> Running Testset hotspot on all platforms and hotspot_all on one platform
>>>
>>> Regards,
>>> Nils Eliawsson
>>>
>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>>
>>>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>>>> quick. I'll change the comment to an enum that represent who submitted
>>>>> the compile, and add a table for the comments. This could be useful in
>>>>> other settings to.
>>>>
>>>> Sounds good.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> Regards,
>>>>> Nils
>>>>>
>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>> What do you mean "stale"?
>>>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>>
>>>>>>> Summary:
>>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>>> the compile queue as stale.
>>>>>>>
>>>>>>> Solution:
>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>> stale while the test is running. (Also added some extra
>>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>>
>>>>>>> This is an workaround but we should consider fixing something
>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>> task with info about the origin of the compile. The comment field has
>>>>>>> this information - but then it needs to be
>>>>>>> converted to an enum.
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Nils Eliasson
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
>

From igor.veresov at oracle.com  Thu Apr 14 22:15:20 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 14 Apr 2016 15:15:20 -0700
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <570F905A.4050202@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
Message-ID: <790C0A6F-8E06-4891-A771-9112606C6812@oracle.com>

Looks good. Thanks!

igor

> On Apr 14, 2016, at 5:43 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested.
> 
> It gets verbose in the method declarations in compileBroker and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too.
> 
> New webrev:
> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/
> 
> Thanks!
> Nils
> 
> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>> Very nice, I like it.
>> 
>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms.
>> 
>> Thanks,
>> Vladimir
>> 
>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>> Hi,
>>> 
>>> New webrev:
>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/
>>> 
>>> Summary
>>> Introduced an enum CompileReason with members matching all the old
>>> variants, and a table containing all the unchanged strings. I see the
>>> possibility of removing/changing/simplifying some CompileReasons but
>>> have choosen not to do so in this change.
>>> 
>>> Only new logic is the CompileTask::can_become_stale() method.
>>> 
>>> Testing:
>>> Running Testset hotspot on all platforms and hotspot_all on one platform
>>> 
>>> Regards,
>>> Nils Eliawsson
>>> 
>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>> 
>>>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>>>> quick. I'll change the comment to an enum that represent who submitted
>>>>> the compile, and add a table for the comments. This could be useful in
>>>>> other settings to.
>>>> 
>>>> Sounds good.
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>>> 
>>>>> Regards,
>>>>> Nils
>>>>> 
>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>> What do you mean "stale"?
>>>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>> 
>>>>>>> Summary:
>>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>>> the compile queue as stale.
>>>>>>> 
>>>>>>> Solution:
>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>> stale while the test is running. (Also added some extra
>>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>> 
>>>>>>> This is an workaround but we should consider fixing something
>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>> task with info about the origin of the compile. The comment field has
>>>>>>> this information - but then it needs to be
>>>>>>> converted to an enum.
>>>>>>> 
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Nils Eliasson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
> 


From christian.thalinger at oracle.com  Thu Apr 14 22:35:03 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 14 Apr 2016 12:35:03 -1000
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
Message-ID: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>


> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
> Christian,
>  
> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.

That?s unfortunate but I understand.  I?m fine with it then.

>  
> Regards,
> Michael
>  
>  
> From: Christian Thalinger [mailto:christian.thalinger at oracle.com] 
> Sent: Thursday, April 14, 2016 11:20 AM
> To: Berg, Michael C <michael.c.berg at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>  
>  
> On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>  
> See below for context.
>  
> Regards,
> Michael
>  
> From: Christian Thalinger [mailto:christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>] 
> Sent: Wednesday, April 13, 2016 2:08 PM
> To: Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
> Cc: hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: CR for RFR 8153998
>  
>  
> On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>  
>  <>Hi Folks,
> 
> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops.  See: https://bugs.openjdk.java.net/browse/JDK-8151573 <https://bugs.openjdk.java.net/browse/JDK-8151573> for the first half of the implementation.
> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops.
>  
> This code was tested as follows (see jbs entry below):
> 
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 <https://bugs.openjdk.java.net/browse/JDK-8153998>
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ <http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/>
>  
> +//------------------------------MachMskNode-----------------------------------
> +// Machine function Msk Node
> +class MachMskNode : public MachIdealNode {
> Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>  
> <MCB> Ok, that?s easy enough.
>  
> Also, I don?t quite understand why we have:
> +instruct set_mask(rRegI dst, rRegI src) %{
> +  predicate(VM_Version::supports_avx512vl());
> +  match(Set dst (MaskCreateI src));
> +  effect(TEMP dst);
> +  format %{ "createmsk   $dst, $src" %}
> +  ins_encode %{
> +    __ createmsk($dst$$Register, $src$$Register);
> +  %}
> but:
> +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const {
> +    MacroAssembler _masm(&cbuf);
> +    __ restoremsk();
> +  }
>  
> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
> The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.  
> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>  
> Hmm.  So, there is no way we can have a RestoreMaskINode?
>  
> Thanks,
> Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160414/f6e2132c/attachment-0001.html>

From vladimir.kozlov at oracle.com  Thu Apr 14 23:12:56 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 16:12:56 -0700
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
In-Reply-To: <dk61t683cx2.fsf@rwestrel.remote.csb>
References: <dk61t683cx2.fsf@rwestrel.remote.csb>
Message-ID: <571023F8.5090903@oracle.com>

I agree with optimization but I am not sure about changes.

Can we check only one previous block to be more conservative?:

Block* b = prev(targ_block)
bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment() && !b->head()->is_Loop()

Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be for RISC cpus (with fixed instruction size) we should change them.

Thanks,
Vladimir

On 4/13/16 11:46 PM, Roland Westrelin wrote:
>
> When running scimark on aarch64:
>
>   ;; B16: #	B17 <- B21  top-of-loop Freq: 2305.21
>
>    0x000003ffa126f710: add	w17, w11, w12   ;*iadd {reexecute=0 rethrow=0 return_oop=0}
>                                                  ; - jnt.scimark2.FFT::transform_internal at 243 (line 129)
>
>    0x000003ffa126f714: nop
>    0x000003ffa126f718: nop
>    0x000003ffa126f71c: nop                       ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0}
>                                                  ; - jnt.scimark2.FFT::transform_internal at 238 (line 129)
>
>   ;; B17: #	B32 B18 <- B25 B16 	Loop: B17-B16 inner  Freq: 3056.06
>
>    0x000003ffa126f720: lsl	w16, w17, #1    ;*imul {reexecute=0 rethrow=0 return_oop=0}
>                                                  ; - jnt.scimark2.FFT::transform_internal at 244 (line 129)
>
> The 3 nops are added by the code that aligns loop entries: the top of
> loop block is first encountered and its alignment is set, the loop head
> is later encountered through the backbranch of an outer loop and its
> alignment is set.
>
> I propose that the code that aligns loop entries verifies that a loop
> top doesn't exist before it sets the alignment:
>
> http://cr.openjdk.java.net/~roland/8154135/webrev.00/
>
> Roland.
>

From vladimir.kozlov at oracle.com  Thu Apr 14 23:26:59 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 16:26:59 -0700
Subject: CR for RFR 8153998
In-Reply-To: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
Message-ID: <57102743.8080508@oracle.com>

On 4/14/16 3:35 PM, Christian Thalinger wrote:
>
>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>
>> Christian,
>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>
> That?s unfortunate but I understand.  I?m fine with it then.

You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after 
creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.

Vladimir

>
>> Regards,
>> Michael
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:*Thursday, April 14, 2016 11:20 AM
>> *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>> *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>> *Subject:*Re: CR for RFR 8153998
>>
>>     On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>     See below for context.
>>     Regards,
>>     Michael
>>     *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>     *Sent:*Wednesday, April 13, 2016 2:08 PM
>>     *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>     *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>     *Subject:*Re: CR for RFR 8153998
>>
>>         On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>         Hi Folks,
>>
>>         I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>         This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>         Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>         performance and has been modeled over a large number of loop lengths and forms of loops.
>>         This code was tested as follows(see jbs entry below):
>>
>>         Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>
>>         webrev:
>>         http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>
>>     +//------------------------------MachMskNode-----------------------------------
>>
>>     +// Machine function Msk Node
>>
>>     +class MachMskNode : public MachIdealNode {
>>
>>     Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>     <MCB> Ok, that?s easy enough.
>>     Also, I don?t quite understand why we have:
>>
>>     +instruct set_mask(rRegI dst, rRegI src) %{
>>
>>     +  predicate(VM_Version::supports_avx512vl());
>>
>>     +  match(Set dst (MaskCreateI src));
>>
>>     +  effect(TEMP dst);
>>
>>     +  format %{ "createmsk   $dst, $src" %}
>>
>>     +  ins_encode %{
>>
>>     +    __ createmsk($dst$$Register, $src$$Register);
>>
>>     +  %}
>>
>>     but:
>>
>>     +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const {
>>
>>     +    MacroAssembler _masm(&cbuf);
>>
>>     +    __ restoremsk();
>>
>>     +  }
>>
>>     The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>     The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>     The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>     The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>
>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>
>>         Thanks,
>>         Michael
>

From michael.c.berg at intel.com  Thu Apr 14 23:38:48 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 14 Apr 2016 23:38:48 +0000
Subject: CR for RFR 8153998
In-Reply-To: <57102743.8080508@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>

Vladimir,

Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.

I tried something like that early on with CountedLoopEnd.
The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.

-Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, April 14, 2016 4:27 PM
To: Christian Thalinger <christian.thalinger at oracle.com>; Berg, Michael C <michael.c.berg at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: CR for RFR 8153998

On 4/14/16 3:35 PM, Christian Thalinger wrote:
>
>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>
>> Christian,
>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>
> That?s unfortunate but I understand.  I?m fine with it then.

You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.

Vladimir

>
>> Regards,
>> Michael
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C 
>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>> *Cc:*hotspot-compiler-dev at openjdk.java.net 
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> *Subject:*Re: CR for RFR 8153998
>>
>>     On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>     See below for context.
>>     Regards,
>>     Michael
>>     *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>     *Sent:*Wednesday, April 13, 2016 2:08 PM
>>     *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>     *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>     *Subject:*Re: CR for RFR 8153998
>>
>>         On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>         Hi Folks,
>>
>>         I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>         This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>         Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>         performance and has been modeled over a large number of loop lengths and forms of loops.
>>         This code was tested as follows(see jbs entry below):
>>
>>         Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>
>>         webrev:
>>         http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>
>>     
>> +//------------------------------MachMskNode-------------------------
>> ----------
>>
>>     +// Machine function Msk Node
>>
>>     +class MachMskNode : public MachIdealNode {
>>
>>     Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>     <MCB> Ok, that?s easy enough.
>>     Also, I don?t quite understand why we have:
>>
>>     +instruct set_mask(rRegI dst, rRegI src) %{
>>
>>     +  predicate(VM_Version::supports_avx512vl());
>>
>>     +  match(Set dst (MaskCreateI src));
>>
>>     +  effect(TEMP dst);
>>
>>     +  format %{ "createmsk   $dst, $src" %}
>>
>>     +  ins_encode %{
>>
>>     +    __ createmsk($dst$$Register, $src$$Register);
>>
>>     +  %}
>>
>>     but:
>>
>>     +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const 
>> {
>>
>>     +    MacroAssembler _masm(&cbuf);
>>
>>     +    __ restoremsk();
>>
>>     +  }
>>
>>     The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>     The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>     The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>     The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>
>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>
>>         Thanks,
>>         Michael
>

From vladimir.kozlov at oracle.com  Thu Apr 14 23:41:39 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 16:41:39 -0700
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
In-Reply-To: <570F987B.2070202@oracle.com>
References: <570F987B.2070202@oracle.com>
Message-ID: <57102AB3.30709@oracle.com>

I agree with this simple change as the fix.
Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used.
I don't see a PIT link in the bug report.

Thanks,
Vladimir

On 4/14/16 6:17 AM, Nils Eliasson wrote:
> Hi,
>
> Please review this fix.
>
> Summary:
> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile.
>
> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some
> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker.
>
> Solution:
> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any
> logging or warning because this is really a corner case.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
> (Ignore the extra tags in the webrev)
>
> Best regards,
> Nils Eliasson

From vladimir.kozlov at oracle.com  Fri Apr 15 00:02:05 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 17:02:05 -0700
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
Message-ID: <57102F7D.2090303@oracle.com>

On 4/14/16 4:38 PM, Berg, Michael C wrote:
> Vladimir,
>
> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>
> I tried something like that early on with CountedLoopEnd.

In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
I don't see any side effects for restoremask in your code. What are you talking about?

I am suggesting something like next:

instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl)
%{
   predicate(n->has_vect_mask_set());
   match(CountedLoopEnd cop cr);
   effect(USE labl);

   ins_cost(400);
   format %{ "j$cop     $labl\t# loop end\n\t"
             "restoremask \t# vector mask restore for loops"
          %}
   ins_encode %{
     Label* L = $labl$$label;
     __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
     __ restoremask();
   %}
   ins_pipe(pipe_jcc);
%}

Vladimir

> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 4:27 PM
> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg, Michael C <michael.c.berg at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>
> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>
>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>
>>> Christian,
>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>
>> That?s unfortunate but I understand.  I?m fine with it then.
>
> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>
> Vladimir
>
>>
>>> Regards,
>>> Michael
>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C
>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>> *Subject:*Re: CR for RFR 8153998
>>>
>>>      On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>      See below for context.
>>>      Regards,
>>>      Michael
>>>      *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>      *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>      *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>      *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>      *Subject:*Re: CR for RFR 8153998
>>>
>>>          On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>          Hi Folks,
>>>
>>>          I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>          This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>          Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>          performance and has been modeled over a large number of loop lengths and forms of loops.
>>>          This code was tested as follows(see jbs entry below):
>>>
>>>          Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>
>>>          webrev:
>>>          http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>
>>>
>>> +//------------------------------MachMskNode-------------------------
>>> ----------
>>>
>>>      +// Machine function Msk Node
>>>
>>>      +class MachMskNode : public MachIdealNode {
>>>
>>>      Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>      <MCB> Ok, that?s easy enough.
>>>      Also, I don?t quite understand why we have:
>>>
>>>      +instruct set_mask(rRegI dst, rRegI src) %{
>>>
>>>      +  predicate(VM_Version::supports_avx512vl());
>>>
>>>      +  match(Set dst (MaskCreateI src));
>>>
>>>      +  effect(TEMP dst);
>>>
>>>      +  format %{ "createmsk   $dst, $src" %}
>>>
>>>      +  ins_encode %{
>>>
>>>      +    __ createmsk($dst$$Register, $src$$Register);
>>>
>>>      +  %}
>>>
>>>      but:
>>>
>>>      +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const
>>> {
>>>
>>>      +    MacroAssembler _masm(&cbuf);
>>>
>>>      +    __ restoremsk();
>>>
>>>      +  }
>>>
>>>      The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>      The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>      The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>      The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>
>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>
>>>          Thanks,
>>>          Michael
>>

From michael.c.berg at intel.com  Fri Apr 15 00:12:30 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Fri, 15 Apr 2016 00:12:30 +0000
Subject: CR for RFR 8153998
In-Reply-To: <57102F7D.2090303@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>

The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.

Ok, I will try the pattern match method.

Thanks
-Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, April 14, 2016 5:02 PM
To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger <christian.thalinger at oracle.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: CR for RFR 8153998

On 4/14/16 4:38 PM, Berg, Michael C wrote:
> Vladimir,
>
> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>
> I tried something like that early on with CountedLoopEnd.

In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
I don't see any side effects for restoremask in your code. What are you talking about?

I am suggesting something like next:

instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
   predicate(n->has_vect_mask_set());
   match(CountedLoopEnd cop cr);
   effect(USE labl);

   ins_cost(400);
   format %{ "j$cop     $labl\t# loop end\n\t"
             "restoremask \t# vector mask restore for loops"
          %}
   ins_encode %{
     Label* L = $labl$$label;
     __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
     __ restoremask();
   %}
   ins_pipe(pipe_jcc);
%}

Vladimir

> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 4:27 PM
> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg, 
> Michael C <michael.c.berg at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>
> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>
>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>
>>> Christian,
>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>
>> That?s unfortunate but I understand.  I?m fine with it then.
>
> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>
> Vladimir
>
>>
>>> Regards,
>>> Michael
>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C 
>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>> *Subject:*Re: CR for RFR 8153998
>>>
>>>      On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>      See below for context.
>>>      Regards,
>>>      Michael
>>>      *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>      *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>      *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>      *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>      *Subject:*Re: CR for RFR 8153998
>>>
>>>          On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>          Hi Folks,
>>>
>>>          I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>          This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>          Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>          performance and has been modeled over a large number of loop lengths and forms of loops.
>>>          This code was tested as follows(see jbs entry below):
>>>
>>>          Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>
>>>          webrev:
>>>          http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>
>>>
>>> +//------------------------------MachMskNode------------------------
>>> +-
>>> ----------
>>>
>>>      +// Machine function Msk Node
>>>
>>>      +class MachMskNode : public MachIdealNode {
>>>
>>>      Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>      <MCB> Ok, that?s easy enough.
>>>      Also, I don?t quite understand why we have:
>>>
>>>      +instruct set_mask(rRegI dst, rRegI src) %{
>>>
>>>      +  predicate(VM_Version::supports_avx512vl());
>>>
>>>      +  match(Set dst (MaskCreateI src));
>>>
>>>      +  effect(TEMP dst);
>>>
>>>      +  format %{ "createmsk   $dst, $src" %}
>>>
>>>      +  ins_encode %{
>>>
>>>      +    __ createmsk($dst$$Register, $src$$Register);
>>>
>>>      +  %}
>>>
>>>      but:
>>>
>>>      +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) 
>>> const {
>>>
>>>      +    MacroAssembler _masm(&cbuf);
>>>
>>>      +    __ restoremsk();
>>>
>>>      +  }
>>>
>>>      The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>      The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>      The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>      The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>
>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>
>>>          Thanks,
>>>          Michael
>>

From vladimir.kozlov at oracle.com  Fri Apr 15 00:46:52 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 17:46:52 -0700
Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after
	eliminating phi with unique input
In-Reply-To: <570F993C.3040509@oracle.com>
References: <570F993C.3040509@oracle.com>
Message-ID: <571039FC.1000603@oracle.com>

I think check should use !isa_oopptr() since one of nodes could be ConP NULL ptr which is not klassptr.

Thanks,
Vladimir

On 4/14/16 6:21 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the patch for 8153357.
>
> https://bugs.openjdk.java.net/browse/JDK-8153357
>
> Problem: When determining the unique input of a phi, the C2 compiler removes cast nodes connecting the phi to its unique input.
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181
>
> Then (if the phi has indeed a unique input), the C2 compiler attempts replace the phi with a cast node. The new cast node feeds from the unique input.
>
> To be able to remove the phi node, the C2 compiler must  to determine the type of cast to add in place of the phi node (CastII, CastPP, or CheckCastPP).
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705
>
> The failure in the bug report appears because the C2 compiler adds a cast node of unexpected type to the graph (a CheckCastPP instead of a CastPP when casting between two klass pointers).
>
> Please find more details about the cause of the failure in the bug description:
> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108
>
>
> Solution: Refine C2's logic to determine the type of cast node added.
>
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/
>
> Testing:
> - JPRT;
> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp);
> - 500 non-failing runs with the reproducer (the problem reproduces with < 100 runs).
>
> Thank you and best regards,
>
>
> Zoltan
>

From vladimir.kozlov at oracle.com  Fri Apr 15 00:51:44 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 17:51:44 -0700
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
Message-ID: <57103B20.1040207@oracle.com>

On 4/14/16 5:12 PM, Berg, Michael C wrote:
> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.

How it is sizeless when it generates kmovwl() instruction?
Do you mean it does not have side effects (no flags modified)?

Vladimir

>
> Ok, I will try the pattern match method.
>
> Thanks
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 5:02 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger <christian.thalinger at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>
> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>> Vladimir,
>>
>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>>
>> I tried something like that early on with CountedLoopEnd.
>
> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
> I don't see any side effects for restoremask in your code. What are you talking about?
>
> I am suggesting something like next:
>
> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>     predicate(n->has_vect_mask_set());
>     match(CountedLoopEnd cop cr);
>     effect(USE labl);
>
>     ins_cost(400);
>     format %{ "j$cop     $labl\t# loop end\n\t"
>               "restoremask \t# vector mask restore for loops"
>            %}
>     ins_encode %{
>       Label* L = $labl$$label;
>       __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>       __ restoremask();
>     %}
>     ins_pipe(pipe_jcc);
> %}
>
> Vladimir
>
>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>>
>> -Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 4:27 PM
>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg,
>> Michael C <michael.c.berg at intel.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>>
>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>
>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>
>>>> Christian,
>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>
>>> That?s unfortunate but I understand.  I?m fine with it then.
>>
>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>
>> Vladimir
>>
>>>
>>>> Regards,
>>>> Michael
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C
>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>> *Subject:*Re: CR for RFR 8153998
>>>>
>>>>       On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>       See below for context.
>>>>       Regards,
>>>>       Michael
>>>>       *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>       *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>       *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>       *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>       *Subject:*Re: CR for RFR 8153998
>>>>
>>>>           On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>           Hi Folks,
>>>>
>>>>           I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>           This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>           Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>>           performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>           This code was tested as follows(see jbs entry below):
>>>>
>>>>           Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>
>>>>           webrev:
>>>>           http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>
>>>>
>>>> +//------------------------------MachMskNode------------------------
>>>> +-
>>>> ----------
>>>>
>>>>       +// Machine function Msk Node
>>>>
>>>>       +class MachMskNode : public MachIdealNode {
>>>>
>>>>       Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>>       <MCB> Ok, that?s easy enough.
>>>>       Also, I don?t quite understand why we have:
>>>>
>>>>       +instruct set_mask(rRegI dst, rRegI src) %{
>>>>
>>>>       +  predicate(VM_Version::supports_avx512vl());
>>>>
>>>>       +  match(Set dst (MaskCreateI src));
>>>>
>>>>       +  effect(TEMP dst);
>>>>
>>>>       +  format %{ "createmsk   $dst, $src" %}
>>>>
>>>>       +  ins_encode %{
>>>>
>>>>       +    __ createmsk($dst$$Register, $src$$Register);
>>>>
>>>>       +  %}
>>>>
>>>>       but:
>>>>
>>>>       +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*)
>>>> const {
>>>>
>>>>       +    MacroAssembler _masm(&cbuf);
>>>>
>>>>       +    __ restoremsk();
>>>>
>>>>       +  }
>>>>
>>>>       The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>       The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>       The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>       The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>
>>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>>
>>>>           Thanks,
>>>>           Michael
>>>

From michael.c.berg at intel.com  Fri Apr 15 00:54:01 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Fri, 15 Apr 2016 00:54:01 +0000
Subject: CR for RFR 8153998
In-Reply-To: <57103B20.1040207@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
	<57103B20.1040207@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E62B2F@FMSMSX102.amr.corp.intel.com>

Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd.
It will be clean when next you see the code.

-Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, April 14, 2016 5:52 PM
To: Berg, Michael C <michael.c.berg at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: CR for RFR 8153998

On 4/14/16 5:12 PM, Berg, Michael C wrote:
> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.

How it is sizeless when it generates kmovwl() instruction?
Do you mean it does not have side effects (no flags modified)?

Vladimir

>
> Ok, I will try the pattern match method.
>
> Thanks
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 5:02 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger 
> <christian.thalinger at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>
> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>> Vladimir,
>>
>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>>
>> I tried something like that early on with CountedLoopEnd.
>
> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
> I don't see any side effects for restoremask in your code. What are you talking about?
>
> I am suggesting something like next:
>
> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>     predicate(n->has_vect_mask_set());
>     match(CountedLoopEnd cop cr);
>     effect(USE labl);
>
>     ins_cost(400);
>     format %{ "j$cop     $labl\t# loop end\n\t"
>               "restoremask \t# vector mask restore for loops"
>            %}
>     ins_encode %{
>       Label* L = $labl$$label;
>       __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>       __ restoremask();
>     %}
>     ins_pipe(pipe_jcc);
> %}
>
> Vladimir
>
>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>>
>> -Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 4:27 PM
>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg, 
>> Michael C <michael.c.berg at intel.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>>
>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>
>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>
>>>> Christian,
>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>
>>> That?s unfortunate but I understand.  I?m fine with it then.
>>
>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>
>> Vladimir
>>
>>>
>>>> Regards,
>>>> Michael
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C 
>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>> *Subject:*Re: CR for RFR 8153998
>>>>
>>>>       On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>       See below for context.
>>>>       Regards,
>>>>       Michael
>>>>       *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>       *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>       *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>       *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>       *Subject:*Re: CR for RFR 8153998
>>>>
>>>>           On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>           Hi Folks,
>>>>
>>>>           I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>           This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>           Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>>           performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>           This code was tested as follows(see jbs entry below):
>>>>
>>>>           Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>
>>>>           webrev:
>>>>           http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>
>>>>
>>>> +//------------------------------MachMskNode-----------------------
>>>> +-
>>>> +-
>>>> ----------
>>>>
>>>>       +// Machine function Msk Node
>>>>
>>>>       +class MachMskNode : public MachIdealNode {
>>>>
>>>>       Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>>       <MCB> Ok, that?s easy enough.
>>>>       Also, I don?t quite understand why we have:
>>>>
>>>>       +instruct set_mask(rRegI dst, rRegI src) %{
>>>>
>>>>       +  predicate(VM_Version::supports_avx512vl());
>>>>
>>>>       +  match(Set dst (MaskCreateI src));
>>>>
>>>>       +  effect(TEMP dst);
>>>>
>>>>       +  format %{ "createmsk   $dst, $src" %}
>>>>
>>>>       +  ins_encode %{
>>>>
>>>>       +    __ createmsk($dst$$Register, $src$$Register);
>>>>
>>>>       +  %}
>>>>
>>>>       but:
>>>>
>>>>       +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) 
>>>> const {
>>>>
>>>>       +    MacroAssembler _masm(&cbuf);
>>>>
>>>>       +    __ restoremsk();
>>>>
>>>>       +  }
>>>>
>>>>       The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>       The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>       The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>       The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>
>>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>>
>>>>           Thanks,
>>>>           Michael
>>>

From vladimir.kozlov at oracle.com  Fri Apr 15 01:12:18 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 18:12:18 -0700
Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched
	unsafe accesses
In-Reply-To: <570FCB02.6000507@oracle.com>
References: <570FCB02.6000507@oracle.com>
Message-ID: <57103FF2.5060907@oracle.com>

Next assert should be at the beginning of method:
+     assert(type != T_OBJECT || !unaligned, "unaligned access not supported with object type");

Fix Copyright year in the test.

There is no PIT link in the bug report.

Thanks,
Vladimir

On 4/14/16 9:53 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8134918
>
> Type speculation can produce mismatched unsafe accesses.
>
> It injects a guard based on profile data and then propagate type info down to the users. If there's an unsafe access, it can become mismatched w.r.t. profile data being used.
>
> It happens even for valid usages. If an unsafe access always matches memory location at runtime, the code produced by type speculation in that case is effectively dead.
>
> What cause problems are unsafe OOP accesses (U.putObject()/getObject() on non-OOP locations).
>
> The fix is to avoid intrinsification of problematic accesses. Type speculation injects precise type information, which is available during intrinsification.
>
> We could try to support mismatched unsafe object accesses instead, but I don't see any value in that.
>
> Testing: JPRT, pit-hs-comp (in progress).
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Fri Apr 15 01:44:44 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Apr 2016 18:44:44 -0700
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
	<57061D3E.8050408@oracle.com>
	<f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>
	<570632FF.7090103@redhat.com>
	<805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap>
	<63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap>
Message-ID: <5710478C.8050200@oracle.com>

Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results.

Thanks,
Vladimir

On 4/12/16 2:45 AM, Doerr, Martin wrote:
> Hi,
>
> I think we have come to a common understanding and there was no complaint about my latest webrev:
> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/
>
> Can I consider it reviewed?
> Can somebody sponsor, please?
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin
> Sent: Donnerstag, 7. April 2016 12:52
> To: Andrew Haley <aph at redhat.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
>
> Hi Andrew, Jamsheed and all,
>
> thank you very much for your input.
>
> As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count().
> Therefore, I have replaced the storestore barrier introduced with JDK-8143897  (even though this barrier was also correct).
>
> My change still contains a releasing store for newly created ExceptionCache instances.
> As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms.
> I think having the release doesn't hurt too much and makes the design a little cleaner.
>
> I also added comments based on your input.
>
> The new webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/
>
> Please review. I will also need a sponsor from Oracle, please.
>
> Thanks again and best regards,
> Martin
>
>
> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com]
> Sent: Donnerstag, 7. April 2016 12:14
> To: Doerr, Martin <martin.doerr at sap.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
>
> On 07/04/16 10:08, Doerr, Martin wrote:
>
>> atomic update for the _count would only be required if there were
>> multiply threads which attempt to increment it
>> concurrently. However, updates are under lock, so we only have
>> concurrent readers which is ok.
>>
>> I still think "volatile" does what we need here. Especially the xlC
>> compiler on AIX tends to reload variables from memory. Exactly this
>> can be prevented by making the field volatile.
>
> I think your latest patch is OK.  Whether volatile is really good
> enough, I don't know.  The new(ish) C++ memory model treats this as a
> race, and therefore undefined behaviour.  Old C++ didn't have a memory
> model, so the best we can do with racy code is guess about what our
> compilers might do.
>
> I certainly much prefer a release_store to the storestore fence used
> in the fix for 8143897.
>
> Andrew.
>

From zoltan.majo at oracle.com  Fri Apr 15 06:46:20 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Fri, 15 Apr 2016 08:46:20 +0200
Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past
	the end of the heap
In-Reply-To: <570FE7FE.2000001@oracle.com>
References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com>
	<570F8C8F.4080003@oracle.com> <570FE7FE.2000001@oracle.com>
Message-ID: <57108E3C.4030209@oracle.com>

Thank you, Vladimir and Tobias, for the review!

Best regards,


Zoltan

On 04/14/2016 08:57 PM, Vladimir Kozlov wrote:
> Good.
>
> Thanks,
> Vladimir
>
> On 4/14/16 5:26 AM, Zolt?n Maj? wrote:
>> Hi Tobias,
>>
>>
>> thank you for the feedback!
>>
>> On 04/14/2016 02:15 PM, Tobias Hartmann wrote:
>>> [...]
>>> I would simply replace the 'br' by 'brx' which tests either xcc or 
>>> icc depending on the architecture.
>>
>> Yes, that simplifies the code a bit. Here is the updated webrev:
>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/
>>
>> Tests are running.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Best regards,
>>> Tobias
>>>
>>>> Solution: As the VM is handling addresses at the above-mentioned 
>>>> locations, the appropriate condition codes are supposed to be 
>>>> checked. Use 'BPcc' instead of 'Bicc' at these locations.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/
>>>>
>>>> Testing:
>>>> - JPRT
>>>> - reproducer on solaris_sparc.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>


From jamsheed.c.m at oracle.com  Fri Apr 15 07:44:34 2016
From: jamsheed.c.m at oracle.com (Jamsheed C m)
Date: Fri, 15 Apr 2016 13:14:34 +0530
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <5710478C.8050200@oracle.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
	<57061D3E.8050408@oracle.com>
	<f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>
	<570632FF.7090103@redhat.com>
	<805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap>
	<63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap>
	<5710478C.8050200@oracle.com>
Message-ID: <57109BE2.1090602@oracle.com>

Hi Vladimir,

PIT testing is in progress, link is available  in bug report.

Best Regards,
Jamsheed

On 4/15/2016 7:14 AM, Vladimir Kozlov wrote:
> Looks fine to me. Jamsheed, please, run our PIT testing with these 
> changes and analyze results.
>
> Thanks,
> Vladimir
>
> On 4/12/16 2:45 AM, Doerr, Martin wrote:
>> Hi,
>>
>> I think we have come to a common understanding and there was no 
>> complaint about my latest webrev:
>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/
>>
>> Can I consider it reviewed?
>> Can somebody sponsor, please?
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev 
>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>> Doerr, Martin
>> Sent: Donnerstag, 7. April 2016 12:52
>> To: Andrew Haley <aph at redhat.com>; Jamsheed C m 
>> <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: RE: RFR(S): 8153267: nmethod's exception cache not 
>> multi-thread safe
>>
>> Hi Andrew, Jamsheed and all,
>>
>> thank you very much for your input.
>>
>> As Andrew, Jamsheed and I think, it's better to have a releasing 
>> store in increment_count().
>> Therefore, I have replaced the storestore barrier introduced with 
>> JDK-8143897  (even though this barrier was also correct).
>>
>> My change still contains a releasing store for newly created 
>> ExceptionCache instances.
>> As Jamsheed has pointed out, this should not be strictly required as 
>> we have the other barrier. It may only produce additional false 
>> negatives on weak memory model platforms.
>> I think having the release doesn't hurt too much and makes the design 
>> a little cleaner.
>>
>> I also added comments based on your input.
>>
>> The new webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/
>>
>> Please review. I will also need a sponsor from Oracle, please.
>>
>> Thanks again and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Andrew Haley [mailto:aph at redhat.com]
>> Sent: Donnerstag, 7. April 2016 12:14
>> To: Doerr, Martin <martin.doerr at sap.com>; Jamsheed C m 
>> <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR(S): 8153267: nmethod's exception cache not 
>> multi-thread safe
>>
>> On 07/04/16 10:08, Doerr, Martin wrote:
>>
>>> atomic update for the _count would only be required if there were
>>> multiply threads which attempt to increment it
>>> concurrently. However, updates are under lock, so we only have
>>> concurrent readers which is ok.
>>>
>>> I still think "volatile" does what we need here. Especially the xlC
>>> compiler on AIX tends to reload variables from memory. Exactly this
>>> can be prevented by making the field volatile.
>>
>> I think your latest patch is OK.  Whether volatile is really good
>> enough, I don't know.  The new(ish) C++ memory model treats this as a
>> race, and therefore undefined behaviour.  Old C++ didn't have a memory
>> model, so the best we can do with racy code is guess about what our
>> compilers might do.
>>
>> I certainly much prefer a release_store to the storestore fence used
>> in the fix for 8143897.
>>
>> Andrew.
>>


From michael.c.berg at intel.com  Fri Apr 15 09:04:25 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Fri, 15 Apr 2016 09:04:25 +0000
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62B2F@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
	<57103B20.1040207@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62B2F@FMSMSX102.amr.corp.intel.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E62CA1@FMSMSX102.amr.corp.intel.com>

Vladimir, the code has been updated and is available at:

webrev:
http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ 

Thanks,
Michael

-----Original Message-----
From: Berg, Michael C 
Sent: Thursday, April 14, 2016 5:54 PM
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: RE: CR for RFR 8153998

Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd.
It will be clean when next you see the code.

-Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Thursday, April 14, 2016 5:52 PM
To: Berg, Michael C <michael.c.berg at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: CR for RFR 8153998

On 4/14/16 5:12 PM, Berg, Michael C wrote:
> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.

How it is sizeless when it generates kmovwl() instruction?
Do you mean it does not have side effects (no flags modified)?

Vladimir

>
> Ok, I will try the pattern match method.
>
> Thanks
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 5:02 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger 
> <christian.thalinger at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>
> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>> Vladimir,
>>
>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>>
>> I tried something like that early on with CountedLoopEnd.
>
> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
> I don't see any side effects for restoremask in your code. What are you talking about?
>
> I am suggesting something like next:
>
> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>     predicate(n->has_vect_mask_set());
>     match(CountedLoopEnd cop cr);
>     effect(USE labl);
>
>     ins_cost(400);
>     format %{ "j$cop     $labl\t# loop end\n\t"
>               "restoremask \t# vector mask restore for loops"
>            %}
>     ins_encode %{
>       Label* L = $labl$$label;
>       __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>       __ restoremask();
>     %}
>     ins_pipe(pipe_jcc);
> %}
>
> Vladimir
>
>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>>
>> -Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 4:27 PM
>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg, 
>> Michael C <michael.c.berg at intel.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>>
>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>
>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>
>>>> Christian,
>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>
>>> That?s unfortunate but I understand.  I?m fine with it then.
>>
>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>
>> Vladimir
>>
>>>
>>>> Regards,
>>>> Michael
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C 
>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>> *Subject:*Re: CR for RFR 8153998
>>>>
>>>>       On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>       See below for context.
>>>>       Regards,
>>>>       Michael
>>>>       *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>       *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>       *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>       *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>       *Subject:*Re: CR for RFR 8153998
>>>>
>>>>           On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>           Hi Folks,
>>>>
>>>>           I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>           This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>           Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>>           performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>           This code was tested as follows(see jbs entry below):
>>>>
>>>>           Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>
>>>>           webrev:
>>>>           http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>
>>>>
>>>> +//------------------------------MachMskNode-----------------------
>>>> +-
>>>> +-
>>>> ----------
>>>>
>>>>       +// Machine function Msk Node
>>>>
>>>>       +class MachMskNode : public MachIdealNode {
>>>>
>>>>       Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>>       <MCB> Ok, that?s easy enough.
>>>>       Also, I don?t quite understand why we have:
>>>>
>>>>       +instruct set_mask(rRegI dst, rRegI src) %{
>>>>
>>>>       +  predicate(VM_Version::supports_avx512vl());
>>>>
>>>>       +  match(Set dst (MaskCreateI src));
>>>>
>>>>       +  effect(TEMP dst);
>>>>
>>>>       +  format %{ "createmsk   $dst, $src" %}
>>>>
>>>>       +  ins_encode %{
>>>>
>>>>       +    __ createmsk($dst$$Register, $src$$Register);
>>>>
>>>>       +  %}
>>>>
>>>>       but:
>>>>
>>>>       +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) 
>>>> const {
>>>>
>>>>       +    MacroAssembler _masm(&cbuf);
>>>>
>>>>       +    __ restoremsk();
>>>>
>>>>       +  }
>>>>
>>>>       The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>       The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>       The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>       The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>
>>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>>
>>>>           Thanks,
>>>>           Michael
>>>

From nils.eliasson at oracle.com  Fri Apr 15 09:39:41 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 15 Apr 2016 11:39:41 +0200
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
In-Reply-To: <57102AB3.30709@oracle.com>
References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com>
Message-ID: <5710B6DD.9090009@oracle.com>

Thanks Vladimir!
On 2016-04-15 01:41, Vladimir Kozlov wrote:
> I agree with this simple change as the fix.
> Note, -Xcomp does not switch off Interpreter (we can run without 
> Interpreter). We use !UseInterpreter as indication if Xcomp was used.
> I don't see a PIT link in the bug report.

There was none, Tobias found this regression testing something else.

Now I have added a regression test: 
hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java

Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/

Regards,
Nils

>
> Thanks,
> Vladimir
>
> On 4/14/16 6:17 AM, Nils Eliasson wrote:
>> Hi,
>>
>> Please review this fix.
>>
>> Summary:
>> In JDK-8150646 I added an assert in compile_method that the compiler 
>> must not be NULL. Before there was a return there that just ignored 
>> the compile.
>>
>> Running the VM with the flag combination -Xcomp and 
>> -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter 
>> is set to false (but the interpreter it is still available) and then 
>> some
>> essential methods are forced to be compiled, but the initial 
>> complevel becomes 0 and hits the assert in compileBroker.
>>
>> Solution:
>> We could discuss if it should be allowed to submit compiles on level 
>> 0, a change that would become a bit larger. This time I choose to 
>> extend the _initalized check in compile_method. I didn't add any
>> logging or warning because this is really a corner case.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
>> (Ignore the extra tags in the webrev)
>>
>> Best regards,
>> Nils Eliasson


From rwestrel at redhat.com  Fri Apr 15 11:14:41 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 15 Apr 2016 13:14:41 +0200
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
In-Reply-To: <05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com>
References: <dk61t683cx2.fsf@rwestrel.remote.csb>
	<05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com>
Message-ID: <5710CD21.4060902@redhat.com>


Hi Christian,

Thanks for looking at this.

> I wonder if this has any performance implications (good or bad).
> This alignment is not aarch64 specific so we were doing it all the
> time.

Unless I'm missing something nops in the body of a loop can't really
help performance. This looks like a bug to me.

Roland.

From tobias.hartmann at oracle.com  Fri Apr 15 11:15:42 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 15 Apr 2016 13:15:42 +0200
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
In-Reply-To: <5710B6DD.9090009@oracle.com>
References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com>
	<5710B6DD.9090009@oracle.com>
Message-ID: <5710CD5E.5070103@oracle.com>

Hi Nils,

On 15.04.2016 11:39, Nils Eliasson wrote:
> Thanks Vladimir!
> On 2016-04-15 01:41, Vladimir Kozlov wrote:
>> I agree with this simple change as the fix.
>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used.
>> I don't see a PIT link in the bug report.
> 
> There was none, Tobias found this regression testing something else.
> 
> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java
> 
> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/

Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles.

Otherwise looks good to me.

Best regards,
Tobias

> 
> Regards,
> Nils
> 
>>
>> Thanks,
>> Vladimir
>>
>> On 4/14/16 6:17 AM, Nils Eliasson wrote:
>>> Hi,
>>>
>>> Please review this fix.
>>>
>>> Summary:
>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile.
>>>
>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some
>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker.
>>>
>>> Solution:
>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any
>>> logging or warning because this is really a corner case.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
>>> (Ignore the extra tags in the webrev)
>>>
>>> Best regards,
>>> Nils Eliasson
> 

From nils.eliasson at oracle.com  Fri Apr 15 11:22:59 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 15 Apr 2016 13:22:59 +0200
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
In-Reply-To: <5710CD5E.5070103@oracle.com>
References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com>
	<5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com>
Message-ID: <5710CF13.5090404@oracle.com>

Hi Tobias,

Thanks for your feedback!

New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03

Regards,
Nils

On 2016-04-15 13:15, Tobias Hartmann wrote:
> Hi Nils,
>
> On 15.04.2016 11:39, Nils Eliasson wrote:
>> Thanks Vladimir!
>> On 2016-04-15 01:41, Vladimir Kozlov wrote:
>>> I agree with this simple change as the fix.
>>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used.
>>> I don't see a PIT link in the bug report.
>> There was none, Tobias found this regression testing something else.
>>
>> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java
>>
>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/
> Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles.
>
> Otherwise looks good to me.
>
> Best regards,
> Tobias
>
>> Regards,
>> Nils
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/14/16 6:17 AM, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> Please review this fix.
>>>>
>>>> Summary:
>>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile.
>>>>
>>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some
>>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker.
>>>>
>>>> Solution:
>>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any
>>>> logging or warning because this is really a corner case.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
>>>> (Ignore the extra tags in the webrev)
>>>>
>>>> Best regards,
>>>> Nils Eliasson


From tobias.hartmann at oracle.com  Fri Apr 15 11:24:10 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 15 Apr 2016 13:24:10 +0200
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
In-Reply-To: <5710CF13.5090404@oracle.com>
References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com>
	<5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com>
	<5710CF13.5090404@oracle.com>
Message-ID: <5710CF5A.3090409@oracle.com>

Hi Nils,

On 15.04.2016 13:22, Nils Eliasson wrote:
> Hi Tobias,
> 
> Thanks for your feedback!
> 
> New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03

Looks good!

Best regards,
Tobias

> 
> Regards,
> Nils
> 
> On 2016-04-15 13:15, Tobias Hartmann wrote:
>> Hi Nils,
>>
>> On 15.04.2016 11:39, Nils Eliasson wrote:
>>> Thanks Vladimir!
>>> On 2016-04-15 01:41, Vladimir Kozlov wrote:
>>>> I agree with this simple change as the fix.
>>>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used.
>>>> I don't see a PIT link in the bug report.
>>> There was none, Tobias found this regression testing something else.
>>>
>>> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java
>>>
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/
>> Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles.
>>
>> Otherwise looks good to me.
>>
>> Best regards,
>> Tobias
>>
>>> Regards,
>>> Nils
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/14/16 6:17 AM, Nils Eliasson wrote:
>>>>> Hi,
>>>>>
>>>>> Please review this fix.
>>>>>
>>>>> Summary:
>>>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile.
>>>>>
>>>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some
>>>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker.
>>>>>
>>>>> Solution:
>>>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any
>>>>> logging or warning because this is really a corner case.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
>>>>> (Ignore the extra tags in the webrev)
>>>>>
>>>>> Best regards,
>>>>> Nils Eliasson
> 

From vladimir.x.ivanov at oracle.com  Fri Apr 15 11:42:40 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 15 Apr 2016 14:42:40 +0300
Subject: RFR(m): 8145468 deprecations for java.lang
In-Reply-To: <57107B6B.5070100@javaspecialists.eu>
References: <570EF756.2090602@oracle.com>
	<570FAB0C.7070505@javaspecialists.eu> <57101D11.6060109@oracle.com>
	<5710234A.2060001@oracle.com> <57107B6B.5070100@javaspecialists.eu>
Message-ID: <5710D3B0.6010903@oracle.com>


>>> I had a sidebar with Shipilev on this, and this is indeed still
>>> potentially an issue. Alexey's example was:
>>>
>>>      set.contains(new Integer(i))      // 1
>>>
>>> vs
>>>
>>>      set.contains(Integer.valueOf(i))  // 2
>>>
>>> EA is able to optimize away the allocation in line 1, but the additional
>>> complexity of dealing with the Integer cache in valueOf() defeats EA for
>>> line 2. (Autoboxing pretty much desugars to line 2.)
>>
>> I'd say it's a motivating example to improve EA implementation in C2,
>> but not to avoid deprecation of public constructors in primitive type
>> boxes. It shouldn't matter for C2 whether Integer.valueOf() or
>> Integer::<init> is used. If it does, it's a bug.
>>
> To do that would probably require a change to the Java Language
> Specification to allow us to get rid of the IntegerCache.  Unfortunately
> it is defined to have a range of -128 to 127 at least in the cache, so
> this probably makes it really hard or impossible to optimize this with
> EA.  I always found it amusing that the killer application for EA,
> getting rid of autoboxed Integer objects, didn't really work :-)

Still, I'd separate optimization and specification aspects.

This case is neither "really hard" nor impossible to optimize.
What is hard is to ensure the optimization covers all interesting cases 
:-)

AFAIK C2 should already do a pretty decent job of eliminating box/unbox 
pairs (e.g., Integer.valueOf().intValue()) and the cache is not a 
problem here.

What can cause problems is when box identity intervenes. For example, 
even for non-escaping objects the runtime has to be able to materialize 
them at safepoints. In order to preserve identity invariants, the 
runtime has to take into account how the box is created (constructor vs 
factory method).

Probably, that's the missing case right now. But there's nothing 
insurmountable to fix it - the runtime should just consult the cache in 
the rare cases when rematerialization is necessary.

Best regards,
Vladimir Ivanov

From rwestrel at redhat.com  Fri Apr 15 12:24:30 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 15 Apr 2016 14:24:30 +0200
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
In-Reply-To: <571023F8.5090903@oracle.com>
References: <dk61t683cx2.fsf@rwestrel.remote.csb> <571023F8.5090903@oracle.com>
Message-ID: <5710DD7E.1060105@redhat.com>

Hi Vladimir,

Thanks for looking at this.

> I agree with optimization but I am not sure about changes.

Is this an optimization? It looks more like a bug to me.

> Can we check only one previous block to be more conservative?:
> 
> Block* b = prev(targ_block)
> bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment()
> && !b->head()->is_Loop()

That would be good enough as far as I can tell. Here is a new webrev:

http://cr.openjdk.java.net/~roland/8154135/webrev.01/

> Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be
> for RISC cpus (with fixed instruction size) we should change them.

Thanks for the pointer. This said, I don't see what could prevent the
problem I see from happening on x86 so to me it looks like a bug, rather
than a tuning problem.

Roland.

From nils.eliasson at oracle.com  Fri Apr 15 13:06:58 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 15 Apr 2016 15:06:58 +0200
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
	<CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>
Message-ID: <5710E772.5050801@oracle.com>

Hi,

On 2016-04-14 20:45, Christian Thalinger wrote:
>
>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson <nils.eliasson at oracle.com 
>> <mailto:nils.eliasson at oracle.com>> wrote:
>>
>> I moved the reasons to CompileTask.hpp and put it together with the 
>> names list. Also changed the type from int to CompileReason as Igor 
>> suggested.
>>
>> It gets verbose in the method declarations in compileBroker
>
> Don?t worry about this.
>
>> and sometimes I think CompileReason should be declared in 
>> CompileBroker because it is mostly used there. On the other hand, 
>> CompileTask is the keeper of the CompileReason so it makes sense too.
>
> Yes, that?s the right place.
>
>>
>> New webrev:
>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ 
>> <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.03/>
>
> *+ bool can_become_stale() const {*
> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);*
> *+ }*
> I?m not a fan of implicit contracts just defined by comments.  This 
> method doesn?t seem to be performance critical so I would suggest to 
> use a switch-case.  An attribute on the enum would be much better but 
> we all know this isn?t Java.

As you suggested:
http://cr.openjdk.java.net/~neliasso/8153013/webrev.04

Also made reasons CTW and Replay not stale-able.

Thanks!
Nils

>
>>
>> Thanks!
>> Nils
>>
>> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>>> Very nice, I like it.
>>>
>>> One note. CompileReason (and its names) should be CompileTask class 
>>> where it is recorded. Then CompileTask::can_become_stale() can be in 
>>> header file so it is inlinined on all platforms.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> New webrev:
>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ 
>>>> <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.02/>
>>>>
>>>> Summary
>>>> Introduced an enum CompileReason with members matching all the old
>>>> variants, and a table containing all the unchanged strings. I see the
>>>> possibility of removing/changing/simplifying some CompileReasons but
>>>> have choosen not to do so in this change.
>>>>
>>>> Only new logic is the CompileTask::can_become_stale() method.
>>>>
>>>> Testing:
>>>> Running Testset hotspot on all platforms and hotspot_all on one 
>>>> platform
>>>>
>>>> Regards,
>>>> Nils Eliawsson
>>>>
>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>>>
>>>>>> I'll do a proper fix, it is the right thing to do and should be 
>>>>>> pretty
>>>>>> quick. I'll change the comment to an enum that represent who 
>>>>>> submitted
>>>>>> the compile, and add a table for the comments. This could be 
>>>>>> useful in
>>>>>> other settings to.
>>>>>
>>>>> Sounds good.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Nils
>>>>>>
>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>>> What do you mean "stale"?
>>>>>>> I would prefer to see the real fix as you suggested to avoid 
>>>>>>> removing
>>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>>>
>>>>>>>> Summary:
>>>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>>>> the compile queue as stale.
>>>>>>>>
>>>>>>>> Solution:
>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>>> stale while the test is running. (Also added some extra
>>>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>>>
>>>>>>>> This is an workaround but we should consider fixing something
>>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>>> task with info about the origin of the compile. The comment 
>>>>>>>> field has
>>>>>>>> this information - but then it needs to be
>>>>>>>> converted to an enum.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Nils Eliasson
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160415/0cd0a37e/attachment.html>

From nils.eliasson at oracle.com  Fri Apr 15 13:30:29 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 15 Apr 2016 15:30:29 +0200
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
In-Reply-To: <57102AB3.30709@oracle.com>
References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com>
Message-ID: <5710ECF5.80201@oracle.com>

I forgot the link to the test job:

This is both for this and JDK-8153013 BlockingCompilation test times out
https://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1309-10848

Regards,
//Nils

On 2016-04-15 01:41, Vladimir Kozlov wrote:
> I agree with this simple change as the fix.
> Note, -Xcomp does not switch off Interpreter (we can run without 
> Interpreter). We use !UseInterpreter as indication if Xcomp was used.
> I don't see a PIT link in the bug report.
>
> Thanks,
> Vladimir
>
> On 4/14/16 6:17 AM, Nils Eliasson wrote:
>> Hi,
>>
>> Please review this fix.
>>
>> Summary:
>> In JDK-8150646 I added an assert in compile_method that the compiler 
>> must not be NULL. Before there was a return there that just ignored 
>> the compile.
>>
>> Running the VM with the flag combination -Xcomp and 
>> -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter 
>> is set to false (but the interpreter it is still available) and then 
>> some
>> essential methods are forced to be compiled, but the initial 
>> complevel becomes 0 and hits the assert in compileBroker.
>>
>> Solution:
>> We could discuss if it should be allowed to submit compiles on level 
>> 0, a change that would become a bit larger. This time I choose to 
>> extend the _initalized check in compile_method. I didn't add any
>> logging or warning because this is really a corner case.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
>> (Ignore the extra tags in the webrev)
>>
>> Best regards,
>> Nils Eliasson


From nils.eliasson at oracle.com  Fri Apr 15 15:10:35 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 15 Apr 2016 17:10:35 +0200
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
Message-ID: <5711046B.9080808@oracle.com>

Hi,

Please review this fix of print opto_assembly.

Summary:
The compilelog can get corrupted and the VM may assert on "failed: bad 
tag in log".

When printing assembly in output.cpp we first take the ttylock, print 
the head and then the method metadata. However the metadata printing 
makes a vm entry and may block for a safepoint and will then release the 
lock (break_tty_lock_for_safepoint). After that some of the other 
compiler thread that haven't safepointed will take the lock and the 
broken log will be a fact when the safepoint is over and the first 
thread starts logging again.

Solution:
Print the method metadata to a temporary buffer, then take the tty lock.

Testing:
Repro from bug stops failing.
Running :hotspot_all 
(http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854)

Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/

Regards,
Nils Eliasson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160415/ca11b272/attachment.html>

From zoltan.majo at oracle.com  Fri Apr 15 15:11:37 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Fri, 15 Apr 2016 17:11:37 +0200
Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after
	eliminating phi with unique input
In-Reply-To: <571039FC.1000603@oracle.com>
References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com>
Message-ID: <571104A9.7060208@oracle.com>

Hi Vladimir,


thank you for the feedback!

On 04/15/2016 02:46 AM, Vladimir Kozlov wrote:
> I think check should use !isa_oopptr() since one of nodes could be 
> ConP NULL ptr which is not klassptr.

Here is the updated webrev:
http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/

RBT testing passes. I did ~70 runs with the reproducer, no problems have 
shown up so far. I'll do ~900 more runs, though.

Thank you!

Best regards,


Zoltan

>
> Thanks,
> Vladimir
>
> On 4/14/16 6:21 AM, Zolt?n Maj? wrote:
>> Hi,
>>
>>
>> please review the patch for 8153357.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153357
>>
>> Problem: When determining the unique input of a phi, the C2 compiler 
>> removes cast nodes connecting the phi to its unique input.
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 
>>
>>
>> Then (if the phi has indeed a unique input), the C2 compiler attempts 
>> replace the phi with a cast node. The new cast node feeds from the 
>> unique input.
>>
>> To be able to remove the phi node, the C2 compiler must  to determine 
>> the type of cast to add in place of the phi node (CastII, CastPP, or 
>> CheckCastPP).
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 
>>
>>
>> The failure in the bug report appears because the C2 compiler adds a 
>> cast node of unexpected type to the graph (a CheckCastPP instead of a 
>> CastPP when casting between two klass pointers).
>>
>> Please find more details about the cause of the failure in the bug 
>> description:
>> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 
>>
>>
>>
>> Solution: Refine C2's logic to determine the type of cast node added.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/
>>
>> Testing:
>> - JPRT;
>> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp);
>> - 500 non-failing runs with the reproducer (the problem reproduces 
>> with < 100 runs).
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>


From zoltan.majo at oracle.com  Fri Apr 15 15:25:01 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Fri, 15 Apr 2016 17:25:01 +0200
Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if
	on-stack-replacement is enabled
Message-ID: <571107CD.8070205@oracle.com>

Hi,


please review the patch for 8072428.

https://bugs.openjdk.java.net/browse/JDK-8072428

Problem: On-stack-replacement requires loop counters; disabling loop 
counters with on-stack-replacement enabled triggers an assert.

Solution: Set UseLoopCounter ergonomically if on-stack-replacement is 
enabled. Print warning.

Webrev:
http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/

Tested with locally-built VM (linux_x64).

Thank you!

Best regards,


Zoltan


From long.chen at linaro.org  Fri Apr 15 12:45:44 2016
From: long.chen at linaro.org (Long Chen)
Date: Fri, 15 Apr 2016 20:45:44 +0800
Subject: aarch64: RFR: Block zeroing by 'DC ZVA'
Message-ID: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>

Hi

Please review this patch making use of DC ZVA to do block zeroing.

http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.patch

I?m sorry that I can?t produce a test case matching the ?clear_array?
pattern showing obvious improvement. However, generating ?DC ZVA? should be
the right thing to do as it usually has better cache behaviors. Besides,
gcc and linux?s memset have been using ?DC ZVA?.

The ArrayFillByte case benefits from ?DC ZVA? when the array length is
large.
Test, http://people.linaro.org/~long.chen/block_zeroing/ArrayFillByte.java
Performance result,
http://people.linaro.org/~long.chen/block_zeroing/BlockZeroing.html

Tested with jtreg hotspot and langtools.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160415/41c519b0/attachment.html>

From igor.ignatyev at oracle.com  Fri Apr 15 16:50:57 2016
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 15 Apr 2016 09:50:57 -0700
Subject: RFR(XS): 8154174: improve JitTester performance
In-Reply-To: <570F9B88.2040102@oracle.com>
References: <570F9B88.2040102@oracle.com>
Message-ID: <ABD79335-4DA4-4919-889A-D8FF3D2FCB7B@oracle.com>

Hi Anton,

looks good to me, thanks for doing that.

? Igor
> On Apr 14, 2016, at 6:30 AM, Anton Ivanov <anton.ivanov at oracle.com> wrote:
> 
> Hi,
> Please review small patch that improves JitTester performance
> 
> In current implementation JitTester has exception based logic, which is not good by itself, but changing this is quite expensive and there is simple way to decrease exception overhead - turn off stack trace in ProductionFailedException constructor ( this exception is created very often and stack trace is never need, as it only used to control program flow )
> Also small improvement was done in code that does deep copy of SymbolTable element ( Map iteration was rewritten to get rid of multiple redundant Map.get() which cost 0(1) only in average case and could be worse potentially )
> 
> Testing: local
> 
> webrev: http://cr.openjdk.java.net/~aaivanov/8154174/webrev
> bug: https://bugs.openjdk.java.net/browse/JDK-8154174
> 
> -- 
> Best regards,
> Anton Ivanov
> 


From vladimir.kozlov at oracle.com  Fri Apr 15 17:14:05 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Apr 2016 10:14:05 -0700
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
In-Reply-To: <5710CF13.5090404@oracle.com>
References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com>
	<5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com>
	<5710CF13.5090404@oracle.com>
Message-ID: <5711215D.4060202@oracle.com>

Looks good. Make sure the test is executed in JPRT.

Thanks,
Vladimir

On 4/15/16 4:22 AM, Nils Eliasson wrote:
> Hi Tobias,
>
> Thanks for your feedback!
>
> New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03
>
> Regards,
> Nils
>
> On 2016-04-15 13:15, Tobias Hartmann wrote:
>> Hi Nils,
>>
>> On 15.04.2016 11:39, Nils Eliasson wrote:
>>> Thanks Vladimir!
>>> On 2016-04-15 01:41, Vladimir Kozlov wrote:
>>>> I agree with this simple change as the fix.
>>>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication
>>>> if Xcomp was used.
>>>> I don't see a PIT link in the bug report.
>>> There was none, Tobias found this regression testing something else.
>>>
>>> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java
>>>
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/
>> Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30
>> ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles.
>>
>> Otherwise looks good to me.
>>
>> Best regards,
>> Tobias
>>
>>> Regards,
>>> Nils
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/14/16 6:17 AM, Nils Eliasson wrote:
>>>>> Hi,
>>>>>
>>>>> Please review this fix.
>>>>>
>>>>> Summary:
>>>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return
>>>>> there that just ignored the compile.
>>>>>
>>>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation:
>>>>> UseInterpreter is set to false (but the interpreter it is still available) and then some
>>>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker.
>>>>>
>>>>> Solution:
>>>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger.
>>>>> This time I choose to extend the _initalized check in compile_method. I didn't add any
>>>>> logging or warning because this is really a corner case.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
>>>>> (Ignore the extra tags in the webrev)
>>>>>
>>>>> Best regards,
>>>>> Nils Eliasson
>

From vladimir.kozlov at oracle.com  Fri Apr 15 17:27:40 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Apr 2016 10:27:40 -0700
Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after
	eliminating phi with unique input
In-Reply-To: <571104A9.7060208@oracle.com>
References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com>
	<571104A9.7060208@oracle.com>
Message-ID: <5711248C.7000503@oracle.com>

Looks good to me.

thanks,
Vladimir

On 4/15/16 8:11 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> thank you for the feedback!
>
> On 04/15/2016 02:46 AM, Vladimir Kozlov wrote:
>> I think check should use !isa_oopptr() since one of nodes could be ConP NULL ptr which is not klassptr.
>
> Here is the updated webrev:
> http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/
>
> RBT testing passes. I did ~70 runs with the reproducer, no problems have shown up so far. I'll do ~900 more runs, though.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/14/16 6:21 AM, Zolt?n Maj? wrote:
>>> Hi,
>>>
>>>
>>> please review the patch for 8153357.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8153357
>>>
>>> Problem: When determining the unique input of a phi, the C2 compiler removes cast nodes connecting the phi to its
>>> unique input.
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181
>>>
>>> Then (if the phi has indeed a unique input), the C2 compiler attempts replace the phi with a cast node. The new cast
>>> node feeds from the unique input.
>>>
>>> To be able to remove the phi node, the C2 compiler must  to determine the type of cast to add in place of the phi
>>> node (CastII, CastPP, or CheckCastPP).
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705
>>>
>>> The failure in the bug report appears because the C2 compiler adds a cast node of unexpected type to the graph (a
>>> CheckCastPP instead of a CastPP when casting between two klass pointers).
>>>
>>> Please find more details about the cause of the failure in the bug description:
>>> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108
>>>
>>>
>>>
>>> Solution: Refine C2's logic to determine the type of cast node added.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/
>>>
>>> Testing:
>>> - JPRT;
>>> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp);
>>> - 500 non-failing runs with the reproducer (the problem reproduces with < 100 runs).
>>>
>>> Thank you and best regards,
>>>
>>>
>>> Zoltan
>>>
>

From vladimir.kozlov at oracle.com  Fri Apr 15 17:34:51 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Apr 2016 10:34:51 -0700
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
In-Reply-To: <5710DD7E.1060105@redhat.com>
References: <dk61t683cx2.fsf@rwestrel.remote.csb>
	<571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com>
Message-ID: <5711263B.9070505@oracle.com>

On 4/15/16 5:24 AM, Roland Westrelin wrote:
> Hi Vladimir,
>
> Thanks for looking at this.
>
>> I agree with optimization but I am not sure about changes.
>
> Is this an optimization? It looks more like a bug to me.

Code is correct but not optimal. I don't think it is bug.

>
>> Can we check only one previous block to be more conservative?:
>>
>> Block* b = prev(targ_block)
>> bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment()
>> && !b->head()->is_Loop()
>
> That would be good enough as far as I can tell. Here is a new webrev:
>
> http://cr.openjdk.java.net/~roland/8154135/webrev.01/

Looks good.

>
>> Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be
>> for RISC cpus (with fixed instruction size) we should change them.
>
> Thanks for the pointer. This said, I don't see what could prevent the
> problem I see from happening on x86 so to me it looks like a bug, rather
> than a tuning problem.

NumberOfLoopInstrToAlign code is used only on x86 and may hide the problem you see.
And I suggested to look on that code to see if we can get additional performance benefits on RISC (on arm64 in your case).

Thanks,
Vladimir

>
> Roland.
>

From vladimir.kozlov at oracle.com  Fri Apr 15 17:44:32 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Apr 2016 10:44:32 -0700
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
In-Reply-To: <5711046B.9080808@oracle.com>
References: <5711046B.9080808@oracle.com>
Message-ID: <57112880.1010204@oracle.com>

Use resizable stream:

stringStream(size_t initial_bufsize = 256);

1024 may not be enough.

Thanks,
Vladimir

On 4/15/16 8:10 AM, Nils Eliasson wrote:
> Hi,
>
> Please review this fix of print opto_assembly.
>
> Summary:
> The compilelog can get corrupted and the VM may assert on "failed: bad tag in log".
>
> When printing assembly in output.cpp we first take the ttylock, print the head and then the method metadata. However the
> metadata printing makes a vm entry and may block for a safepoint and will then release the lock
> (break_tty_lock_for_safepoint). After that some of the other compiler thread that haven't safepointed will take the lock
> and the broken log will be a fact when the safepoint is over and the first thread starts logging again.
>
> Solution:
> Print the method metadata to a temporary buffer, then take the tty lock.
>
> Testing:
> Repro from bug stops failing.
> Running :hotspot_all (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854)
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>
> Regards,
> Nils Eliasson

From rwestrel at redhat.com  Fri Apr 15 18:17:47 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 15 Apr 2016 20:17:47 +0200
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
In-Reply-To: <5711263B.9070505@oracle.com>
References: <dk61t683cx2.fsf@rwestrel.remote.csb>
	<571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com>
	<5711263B.9070505@oracle.com>
Message-ID: <5711304B.3040907@redhat.com>

>> http://cr.openjdk.java.net/~roland/8154135/webrev.01/
> 
> Looks good.

Thanks for the review. I need a sponsor now...

> NumberOfLoopInstrToAlign code is used only on x86 and may hide the
> problem you see.
> And I suggested to look on that code to see if we can get additional
> performance benefits on RISC (on arm64 in your case).

Ok. Again, thanks for the pointer to that piece of code.

Roland.

From vladimir.x.ivanov at oracle.com  Fri Apr 15 18:25:26 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 15 Apr 2016 21:25:26 +0300
Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched
	unsafe accesses
In-Reply-To: <57103FF2.5060907@oracle.com>
References: <570FCB02.6000507@oracle.com> <57103FF2.5060907@oracle.com>
Message-ID: <57113216.8030906@oracle.com>

Thanks for the feedback, Vladimir.

Updated version:
   http://cr.openjdk.java.net/~vlivanov/8134918/webrev.01/

Additional changes:

   * alias type doesn't differentiate between byte[] & boolean[]; use 
address type to narrow the basic type;

> Next assert should be at the beginning of method:
> +     assert(type != T_OBJECT || !unaligned, "unaligned access not
> supported with object type");
Fixed.

> Fix Copyright year in the test.
Fixed.

> There is no PIT link in the bug report.
Added.

Best regards,
Vladimir Ivanov

>
> Thanks,
> Vladimir
>
> On 4/14/16 9:53 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8134918
>>
>> Type speculation can produce mismatched unsafe accesses.
>>
>> It injects a guard based on profile data and then propagate type info
>> down to the users. If there's an unsafe access, it can become
>> mismatched w.r.t. profile data being used.
>>
>> It happens even for valid usages. If an unsafe access always matches
>> memory location at runtime, the code produced by type speculation in
>> that case is effectively dead.
>>
>> What cause problems are unsafe OOP accesses (U.putObject()/getObject()
>> on non-OOP locations).
>>
>> The fix is to avoid intrinsification of problematic accesses. Type
>> speculation injects precise type information, which is available
>> during intrinsification.
>>
>> We could try to support mismatched unsafe object accesses instead, but
>> I don't see any value in that.
>>
>> Testing: JPRT, pit-hs-comp (in progress).
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Fri Apr 15 18:26:21 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 15 Apr 2016 21:26:21 +0300
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
In-Reply-To: <5710DD7E.1060105@redhat.com>
References: <dk61t683cx2.fsf@rwestrel.remote.csb>
	<571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com>
Message-ID: <5711324D.6090000@oracle.com>

Looks good.

I'll sponsor the fix.

Best regards,
Vladimir Ivanov

On 4/15/16 3:24 PM, Roland Westrelin wrote:
> That would be good enough as far as I can tell. Here is a new webrev:
>
> http://cr.openjdk.java.net/~roland/8154135/webrev.01/

From vladimir.kozlov at oracle.com  Fri Apr 15 18:52:47 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Apr 2016 11:52:47 -0700
Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched
	unsafe accesses
In-Reply-To: <57113216.8030906@oracle.com>
References: <570FCB02.6000507@oracle.com> <57103FF2.5060907@oracle.com>
	<57113216.8030906@oracle.com>
Message-ID: <5711387F.1040600@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/15/16 11:25 AM, Vladimir Ivanov wrote:
> Thanks for the feedback, Vladimir.
>
> Updated version:
>    http://cr.openjdk.java.net/~vlivanov/8134918/webrev.01/
>
> Additional changes:
>
>    * alias type doesn't differentiate between byte[] & boolean[]; use address type to narrow the basic type;
>
>> Next assert should be at the beginning of method:
>> +     assert(type != T_OBJECT || !unaligned, "unaligned access not
>> supported with object type");
> Fixed.
>
>> Fix Copyright year in the test.
> Fixed.
>
>> There is no PIT link in the bug report.
> Added.
>
> Best regards,
> Vladimir Ivanov
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/14/16 9:53 AM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8134918
>>>
>>> Type speculation can produce mismatched unsafe accesses.
>>>
>>> It injects a guard based on profile data and then propagate type info
>>> down to the users. If there's an unsafe access, it can become
>>> mismatched w.r.t. profile data being used.
>>>
>>> It happens even for valid usages. If an unsafe access always matches
>>> memory location at runtime, the code produced by type speculation in
>>> that case is effectively dead.
>>>
>>> What cause problems are unsafe OOP accesses (U.putObject()/getObject()
>>> on non-OOP locations).
>>>
>>> The fix is to avoid intrinsification of problematic accesses. Type
>>> speculation injects precise type information, which is available
>>> during intrinsification.
>>>
>>> We could try to support mismatched unsafe object accesses instead, but
>>> I don't see any value in that.
>>>
>>> Testing: JPRT, pit-hs-comp (in progress).
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Fri Apr 15 18:56:16 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Apr 2016 11:56:16 -0700
Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if
	on-stack-replacement is enabled
In-Reply-To: <571107CD.8070205@oracle.com>
References: <571107CD.8070205@oracle.com>
Message-ID: <57113950.9070700@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/15/16 8:25 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the patch for 8072428.
>
> https://bugs.openjdk.java.net/browse/JDK-8072428
>
> Problem: On-stack-replacement requires loop counters; disabling loop counters with on-stack-replacement enabled triggers
> an assert.
>
> Solution: Set UseLoopCounter ergonomically if on-stack-replacement is enabled. Print warning.
>
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/
>
> Tested with locally-built VM (linux_x64).
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>

From rwestrel at redhat.com  Fri Apr 15 18:58:39 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 15 Apr 2016 20:58:39 +0200
Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body
In-Reply-To: <5711324D.6090000@oracle.com>
References: <dk61t683cx2.fsf@rwestrel.remote.csb>
	<571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com>
	<5711324D.6090000@oracle.com>
Message-ID: <571139DF.4070607@redhat.com>

> Looks good.
> 
> I'll sponsor the fix.

Thanks for the review and for pushing it!

Roland.

From vladimir.kozlov at oracle.com  Fri Apr 15 19:30:15 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Apr 2016 12:30:15 -0700
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <57109BE2.1090602@oracle.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
	<57061D3E.8050408@oracle.com>
	<f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>
	<570632FF.7090103@redhat.com>
	<805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap>
	<63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap>
	<5710478C.8050200@oracle.com> <57109BE2.1090602@oracle.com>
Message-ID: <57114147.5060206@oracle.com>

Thank you, Jamsheed. Testing results looks fine so far. I am pushing it.

Thanks,
Vladimir

On 4/15/16 12:44 AM, Jamsheed C m wrote:
> Hi Vladimir,
>
> PIT testing is in progress, link is available  in bug report.
>
> Best Regards,
> Jamsheed
>
> On 4/15/2016 7:14 AM, Vladimir Kozlov wrote:
>> Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/12/16 2:45 AM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> I think we have come to a common understanding and there was no complaint about my latest webrev:
>>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/
>>>
>>> Can I consider it reviewed?
>>> Can somebody sponsor, please?
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin
>>> Sent: Donnerstag, 7. April 2016 12:52
>>> To: Andrew Haley <aph at redhat.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
>>>
>>> Hi Andrew, Jamsheed and all,
>>>
>>> thank you very much for your input.
>>>
>>> As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count().
>>> Therefore, I have replaced the storestore barrier introduced with JDK-8143897  (even though this barrier was also
>>> correct).
>>>
>>> My change still contains a releasing store for newly created ExceptionCache instances.
>>> As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce
>>> additional false negatives on weak memory model platforms.
>>> I think having the release doesn't hurt too much and makes the design a little cleaner.
>>>
>>> I also added comments based on your input.
>>>
>>> The new webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/
>>>
>>> Please review. I will also need a sponsor from Oracle, please.
>>>
>>> Thanks again and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Andrew Haley [mailto:aph at redhat.com]
>>> Sent: Donnerstag, 7. April 2016 12:14
>>> To: Doerr, Martin <martin.doerr at sap.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
>>>
>>> On 07/04/16 10:08, Doerr, Martin wrote:
>>>
>>>> atomic update for the _count would only be required if there were
>>>> multiply threads which attempt to increment it
>>>> concurrently. However, updates are under lock, so we only have
>>>> concurrent readers which is ok.
>>>>
>>>> I still think "volatile" does what we need here. Especially the xlC
>>>> compiler on AIX tends to reload variables from memory. Exactly this
>>>> can be prevented by making the field volatile.
>>>
>>> I think your latest patch is OK.  Whether volatile is really good
>>> enough, I don't know.  The new(ish) C++ memory model treats this as a
>>> race, and therefore undefined behaviour.  Old C++ didn't have a memory
>>> model, so the best we can do with racy code is guess about what our
>>> compilers might do.
>>>
>>> I certainly much prefer a release_store to the storestore fence used
>>> in the fix for 8143897.
>>>
>>> Andrew.
>>>
>

From christian.thalinger at oracle.com  Fri Apr 15 20:36:33 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 15 Apr 2016 10:36:33 -1000
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62CA1@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
	<57103B20.1040207@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62B2F@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E62CA1@FMSMSX102.amr.corp.intel.com>
Message-ID: <532B4EF9-CE01-4BCF-8B9A-396D6B004BED@oracle.com>


> On Apr 14, 2016, at 11:04 PM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
> Vladimir, the code has been updated and is available at:
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ 

Much better!

There are a few smaller things:
+class MaskCreateINode : public Node {
Didn?t we agree on CreateMaskINode?

names should be consistent: MaskCreateINode -> CreateMaskINode, set_mask -> createMask.

+    Flag_has_vect_mask_set           = Flag_is_scheduled << 1,
+  bool has_vect_mask_set() const { return (_flags & Flag_has_vect_mask_set) != 0; }
Please rename to *has_vector_mask_set.
+const bool Matcher::has_predicated_vectors(void) {
+  bool ret_value = false;
+  switch(UseAVX) {
+    case 0:
+    case 1:
+    case 2:
+      break;
+
+    case 3:
+      ret_value = VM_Version::supports_avx512vl();
+      break;
+  }
+
+  return ret_value;
+}
Change this to:
+const bool Matcher::has_predicated_vectors(void) {
+  switch (UseAVX) {
+    case 3:
+      return VM_Version::supports_avx512vl();
+    default:
       return false;
+  }
+}
src/share/vm/opto/matcher.hpp

+  // Some uarchs have predicated registers on vectors
Is ?uarchs? a typo?

> 
> Thanks,
> Michael
> 
> -----Original Message-----
> From: Berg, Michael C 
> Sent: Thursday, April 14, 2016 5:54 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: RE: CR for RFR 8153998
> 
> Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd.
> It will be clean when next you see the code.
> 
> -Michael
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 5:52 PM
> To: Berg, Michael C <michael.c.berg at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
> 
> On 4/14/16 5:12 PM, Berg, Michael C wrote:
>> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.
> 
> How it is sizeless when it generates kmovwl() instruction?
> Do you mean it does not have side effects (no flags modified)?
> 
> Vladimir
> 
>> 
>> Ok, I will try the pattern match method.
>> 
>> Thanks
>> -Michael
>> 
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 5:02 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger 
>> <christian.thalinger at oracle.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>> 
>> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>>> Vladimir,
>>> 
>>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>>> 
>>> I tried something like that early on with CountedLoopEnd.
>> 
>> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
>> I don't see any side effects for restoremask in your code. What are you talking about?
>> 
>> I am suggesting something like next:
>> 
>> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>>    predicate(n->has_vect_mask_set());
>>    match(CountedLoopEnd cop cr);
>>    effect(USE labl);
>> 
>>    ins_cost(400);
>>    format %{ "j$cop     $labl\t# loop end\n\t"
>>              "restoremask \t# vector mask restore for loops"
>>           %}
>>    ins_encode %{
>>      Label* L = $labl$$label;
>>      __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>>      __ restoremask();
>>    %}
>>    ins_pipe(pipe_jcc);
>> %}
>> 
>> Vladimir
>> 
>>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>>> 
>>> -Michael
>>> 
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Thursday, April 14, 2016 4:27 PM
>>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg, 
>>> Michael C <michael.c.berg at intel.com>
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: CR for RFR 8153998
>>> 
>>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>> 
>>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>> 
>>>>> Christian,
>>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>> 
>>>> That?s unfortunate but I understand.  I?m fine with it then.
>>> 
>>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>> 
>>> Vladimir
>>> 
>>>> 
>>>>> Regards,
>>>>> Michael
>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C 
>>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> *Subject:*Re: CR for RFR 8153998
>>>>> 
>>>>>      On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>      See below for context.
>>>>>      Regards,
>>>>>      Michael
>>>>>      *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>>      *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>>      *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>>      *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>      *Subject:*Re: CR for RFR 8153998
>>>>> 
>>>>>          On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>          Hi Folks,
>>>>> 
>>>>>          I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>>          This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>>          Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>>>          performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>>          This code was tested as follows(see jbs entry below):
>>>>> 
>>>>>          Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>> 
>>>>>          webrev:
>>>>>          http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>> 
>>>>> 
>>>>> +//------------------------------MachMskNode-----------------------
>>>>> +-
>>>>> +-
>>>>> ----------
>>>>> 
>>>>>      +// Machine function Msk Node
>>>>> 
>>>>>      +class MachMskNode : public MachIdealNode {
>>>>> 
>>>>>      Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>>>      <MCB> Ok, that?s easy enough.
>>>>>      Also, I don?t quite understand why we have:
>>>>> 
>>>>>      +instruct set_mask(rRegI dst, rRegI src) %{
>>>>> 
>>>>>      +  predicate(VM_Version::supports_avx512vl());
>>>>> 
>>>>>      +  match(Set dst (MaskCreateI src));
>>>>> 
>>>>>      +  effect(TEMP dst);
>>>>> 
>>>>>      +  format %{ "createmsk   $dst, $src" %}
>>>>> 
>>>>>      +  ins_encode %{
>>>>> 
>>>>>      +    __ createmsk($dst$$Register, $src$$Register);
>>>>> 
>>>>>      +  %}
>>>>> 
>>>>>      but:
>>>>> 
>>>>>      +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) 
>>>>> const {
>>>>> 
>>>>>      +    MacroAssembler _masm(&cbuf);
>>>>> 
>>>>>      +    __ restoremsk();
>>>>> 
>>>>>      +  }
>>>>> 
>>>>>      The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>>      The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>>      The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>>      The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>> 
>>>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>>> 
>>>>>          Thanks,
>>>>>          Michael
>>>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160415/45096516/attachment-0001.html>

From christian.thalinger at oracle.com  Fri Apr 15 20:43:10 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 15 Apr 2016 10:43:10 -1000
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <5710E772.5050801@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
	<CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>
	<5710E772.5050801@oracle.com>
Message-ID: <B3A4C7A2-E9B6-403A-BF0A-8B64049F4338@oracle.com>


> On Apr 15, 2016, at 3:06 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> Hi,
> 
> On 2016-04-14 20:45, Christian Thalinger wrote:
>> 
>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson < <mailto:nils.eliasson at oracle.com>nils.eliasson at oracle.com <mailto:nils.eliasson at oracle.com>> wrote:
>>> 
>>> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested.
>>> 
>>> It gets verbose in the method declarations in compileBroker
>> 
>> Don?t worry about this.
>> 
>>> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too.
>> 
>> Yes, that?s the right place.
>> 
>>> 
>>> New webrev:
>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.03/>
>> 
>> +   bool         can_become_stale() const          {
>> +     return !_is_blocking && (_compile_reason < Reason_Whitebox);
>> +   }
>> I?m not a fan of implicit contracts just defined by comments.  This method doesn?t seem to be performance critical so I would suggest to use a switch-case.  An attribute on the enum would be much better but we all know this isn?t Java.
> 
> As you suggested:
> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 <http://cr.openjdk.java.net/~neliasso/8153013/webrev.04>

Thanks.  A space is missing and the closing } indent is wrong:
+   bool         can_become_stale() const          {
+     switch(_compile_reason) {
+       case Reason_BackedgeCount:
+       case Reason_InvocationCount:
+       case Reason_Tiered:
+         return !_is_blocking;
+       }
+     return false;
+   }
Also, what about:
+       Reason_None,
+       Reason_CTW,              // Compile the world
+       Reason_Replay,           // ciReplay
These were covered before.

> 
> Also made reasons CTW and Replay not stale-able. 
> 
> Thanks!
> Nils
> 
>> 
>>> 
>>> Thanks!
>>> Nils
>>> 
>>> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>>>> Very nice, I like it.
>>>> 
>>>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms.
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>>>> Hi,
>>>>> 
>>>>> New webrev:
>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.02/>
>>>>> 
>>>>> Summary
>>>>> Introduced an enum CompileReason with members matching all the old
>>>>> variants, and a table containing all the unchanged strings. I see the
>>>>> possibility of removing/changing/simplifying some CompileReasons but
>>>>> have choosen not to do so in this change.
>>>>> 
>>>>> Only new logic is the CompileTask::can_become_stale() method.
>>>>> 
>>>>> Testing:
>>>>> Running Testset hotspot on all platforms and hotspot_all on one platform
>>>>> 
>>>>> Regards,
>>>>> Nils Eliawsson
>>>>> 
>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>>>> 
>>>>>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>>>>>> quick. I'll change the comment to an enum that represent who submitted
>>>>>>> the compile, and add a table for the comments. This could be useful in
>>>>>>> other settings to.
>>>>>> 
>>>>>> Sounds good.
>>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Nils
>>>>>>> 
>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>>>> What do you mean "stale"?
>>>>>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>> 
>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>>>> 
>>>>>>>>> Summary:
>>>>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>>>>> the compile queue as stale.
>>>>>>>>> 
>>>>>>>>> Solution:
>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>>>> stale while the test is running. (Also added some extra
>>>>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>>>> 
>>>>>>>>> This is an workaround but we should consider fixing something
>>>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>>>> task with info about the origin of the compile. The comment field has
>>>>>>>>> this information - but then it needs to be
>>>>>>>>> converted to an enum.
>>>>>>>>> 
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 <https://bugs.openjdk.java.net/browse/JDK-8153013>
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ <http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/>
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> Nils Eliasson
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160415/8770e5ea/attachment.html>

From michael.c.berg at intel.com  Sat Apr 16 04:20:33 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Sat, 16 Apr 2016 04:20:33 +0000
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E62CA1@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
	<57103B20.1040207@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62B2F@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E62CA1@FMSMSX102.amr.corp.intel.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E631AF@FMSMSX102.amr.corp.intel.com>

Vladimir/Christian:

I believe I have addressed all concerns in this update:

Webrev:
http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ 

Regards,
Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
Sent: Friday, April 15, 2016 2:04 AM
To: 'Vladimir Kozlov' <vladimir.kozlov at oracle.com>
Cc: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: RE: CR for RFR 8153998

Vladimir, the code has been updated and is available at:

webrev:
http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ 

Thanks,
Michael

-----Original Message-----
From: Berg, Michael C
Sent: Thursday, April 14, 2016 5:54 PM
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: RE: CR for RFR 8153998

Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd.
It will be clean when next you see the code.

-Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Thursday, April 14, 2016 5:52 PM
To: Berg, Michael C <michael.c.berg at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: CR for RFR 8153998

On 4/14/16 5:12 PM, Berg, Michael C wrote:
> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.

How it is sizeless when it generates kmovwl() instruction?
Do you mean it does not have side effects (no flags modified)?

Vladimir

>
> Ok, I will try the pattern match method.
>
> Thanks
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 5:02 PM
> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger 
> <christian.thalinger at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>
> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>> Vladimir,
>>
>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>>
>> I tried something like that early on with CountedLoopEnd.
>
> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
> I don't see any side effects for restoremask in your code. What are you talking about?
>
> I am suggesting something like next:
>
> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>     predicate(n->has_vect_mask_set());
>     match(CountedLoopEnd cop cr);
>     effect(USE labl);
>
>     ins_cost(400);
>     format %{ "j$cop     $labl\t# loop end\n\t"
>               "restoremask \t# vector mask restore for loops"
>            %}
>     ins_encode %{
>       Label* L = $labl$$label;
>       __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>       __ restoremask();
>     %}
>     ins_pipe(pipe_jcc);
> %}
>
> Vladimir
>
>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>>
>> -Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 4:27 PM
>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg, 
>> Michael C <michael.c.berg at intel.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>>
>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>
>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>
>>>> Christian,
>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>
>>> That?s unfortunate but I understand.  I?m fine with it then.
>>
>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>
>> Vladimir
>>
>>>
>>>> Regards,
>>>> Michael
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C 
>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>> *Subject:*Re: CR for RFR 8153998
>>>>
>>>>       On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>       See below for context.
>>>>       Regards,
>>>>       Michael
>>>>       *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>       *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>       *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>       *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>       *Subject:*Re: CR for RFR 8153998
>>>>
>>>>           On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>           Hi Folks,
>>>>
>>>>           I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>           This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>           Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>>           performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>           This code was tested as follows(see jbs entry below):
>>>>
>>>>           Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>
>>>>           webrev:
>>>>           http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>
>>>>
>>>> +//------------------------------MachMskNode-----------------------
>>>> +-
>>>> +-
>>>> ----------
>>>>
>>>>       +// Machine function Msk Node
>>>>
>>>>       +class MachMskNode : public MachIdealNode {
>>>>
>>>>       Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>>       <MCB> Ok, that?s easy enough.
>>>>       Also, I don?t quite understand why we have:
>>>>
>>>>       +instruct set_mask(rRegI dst, rRegI src) %{
>>>>
>>>>       +  predicate(VM_Version::supports_avx512vl());
>>>>
>>>>       +  match(Set dst (MaskCreateI src));
>>>>
>>>>       +  effect(TEMP dst);
>>>>
>>>>       +  format %{ "createmsk   $dst, $src" %}
>>>>
>>>>       +  ins_encode %{
>>>>
>>>>       +    __ createmsk($dst$$Register, $src$$Register);
>>>>
>>>>       +  %}
>>>>
>>>>       but:
>>>>
>>>>       +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) 
>>>> const {
>>>>
>>>>       +    MacroAssembler _masm(&cbuf);
>>>>
>>>>       +    __ restoremsk();
>>>>
>>>>       +  }
>>>>
>>>>       The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>       The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>       The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>       The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>
>>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>>
>>>>           Thanks,
>>>>           Michael
>>>

From aph at redhat.com  Sat Apr 16 07:59:38 2016
From: aph at redhat.com (Andrew Haley)
Date: Sat, 16 Apr 2016 08:59:38 +0100
Subject: aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
Message-ID: <5711F0EA.8020106@redhat.com>


+  void dc(cache_maintenance cm, Register Rt) {
+    sys(0b011, 0b0111, cm, 0b001, Rt);
+  }
+
+  void ic(cache_maintenance cm, Register Rt) {
+    sys(0b011, 0b0111, cm, 0b001, Rt);
   }

Are DC and IC really synonyms?

+typedef void (*_zero_Fn)(HeapWord* to, size_t count);
+
 static void pd_fill_to_aligned_words(HeapWord* tohw, size_t count, juint value) {
-  pd_fill_to_words(tohw, count, value);
+  if (UseBlockZeroing
+      && value == 0
+      && count >= (size_t)(BlockZeroingLowLimit >> LogHeapWordSize)) {
+    ((_zero_Fn)StubRoutines::zero_aligned_words())(tohw, count);
+  }
+  else {
+    pd_fill_to_words(tohw, count, value);
+  }
 }

I'm not convinced of the value of this.  We already know that a simple

  while (count-- > 0) {
    *to++ = v;
  }

turns into a call to memset() which does DC ZVA.

diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
--- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
+++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
@@ -4670,11 +4670,54 @@
   BLOCK_COMMENT(is_string ? "} string_equals" : "} array_equals");
 }

+
+// base:     Address of a buffer to be zeroed, 8 bytes aligned.
+// cnt:      Count in 8-byte unit.
+// is_large: True when 'cnt' is known to be >= BlockZeroingLowLimit.
+void MacroAssembler::zero_words(Register base, Register cnt, bool is_large)
+{
+  if (UseBlockZeroing) {
+    Label non_block_zeroing;
+    block_zeroing(base, cnt, non_block_zeroing, is_large);

Always use the imperative form of a verb for methods: "block_zero",
not, "block_zeroing".

 // base:   Address of a buffer to be zeroed, 8 bytes aligned.
-// cnt:    Count in 8-byte unit.
-void MacroAssembler::zero_words(Register base, Register cnt)
+// cnt:    Immediate count in 8-byte unit.

Please make this
// cnt: count in HeapWords

Thanks,

Andrew.


From vladimir.kozlov at oracle.com  Mon Apr 18 07:00:28 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 18 Apr 2016 00:00:28 -0700
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E631AF@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
	<57103B20.1040207@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62B2F@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E62CA1@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E631AF@FMSMSX102.amr.corp.intel.com>
Message-ID: <5714860C.5030702@oracle.com>

This looks good. I will start our testing for it.

Thanks,
Vladimir

On 4/15/16 9:20 PM, Berg, Michael C wrote:
> Vladimir/Christian:
>
> I believe I have addressed all concerns in this update:
>
> Webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/
>
> Regards,
> Michael
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
> Sent: Friday, April 15, 2016 2:04 AM
> To: 'Vladimir Kozlov' <vladimir.kozlov at oracle.com>
> Cc: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: RE: CR for RFR 8153998
>
> Vladimir, the code has been updated and is available at:
>
> webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: Berg, Michael C
> Sent: Thursday, April 14, 2016 5:54 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: RE: CR for RFR 8153998
>
> Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd.
> It will be clean when next you see the code.
>
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 5:52 PM
> To: Berg, Michael C <michael.c.berg at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>
> On 4/14/16 5:12 PM, Berg, Michael C wrote:
>> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.
>
> How it is sizeless when it generates kmovwl() instruction?
> Do you mean it does not have side effects (no flags modified)?
>
> Vladimir
>
>>
>> Ok, I will try the pattern match method.
>>
>> Thanks
>> -Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 5:02 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger
>> <christian.thalinger at oracle.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>>
>> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>>> Vladimir,
>>>
>>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>>>
>>> I tried something like that early on with CountedLoopEnd.
>>
>> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
>> I don't see any side effects for restoremask in your code. What are you talking about?
>>
>> I am suggesting something like next:
>>
>> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>>      predicate(n->has_vect_mask_set());
>>      match(CountedLoopEnd cop cr);
>>      effect(USE labl);
>>
>>      ins_cost(400);
>>      format %{ "j$cop     $labl\t# loop end\n\t"
>>                "restoremask \t# vector mask restore for loops"
>>             %}
>>      ins_encode %{
>>        Label* L = $labl$$label;
>>        __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>>        __ restoremask();
>>      %}
>>      ins_pipe(pipe_jcc);
>> %}
>>
>> Vladimir
>>
>>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>>>
>>> -Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Thursday, April 14, 2016 4:27 PM
>>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg,
>>> Michael C <michael.c.berg at intel.com>
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: CR for RFR 8153998
>>>
>>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>>
>>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>
>>>>> Christian,
>>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>>
>>>> That?s unfortunate but I understand.  I?m fine with it then.
>>>
>>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>>
>>> Vladimir
>>>
>>>>
>>>>> Regards,
>>>>> Michael
>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C
>>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> *Subject:*Re: CR for RFR 8153998
>>>>>
>>>>>        On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>        See below for context.
>>>>>        Regards,
>>>>>        Michael
>>>>>        *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>>        *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>>        *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>>        *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>        *Subject:*Re: CR for RFR 8153998
>>>>>
>>>>>            On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>            Hi Folks,
>>>>>
>>>>>            I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>>            This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>>            Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>>>            performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>>            This code was tested as follows(see jbs entry below):
>>>>>
>>>>>            Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>>
>>>>>            webrev:
>>>>>            http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>>
>>>>>
>>>>> +//------------------------------MachMskNode-----------------------
>>>>> +-
>>>>> +-
>>>>> ----------
>>>>>
>>>>>        +// Machine function Msk Node
>>>>>
>>>>>        +class MachMskNode : public MachIdealNode {
>>>>>
>>>>>        Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>>>        <MCB> Ok, that?s easy enough.
>>>>>        Also, I don?t quite understand why we have:
>>>>>
>>>>>        +instruct set_mask(rRegI dst, rRegI src) %{
>>>>>
>>>>>        +  predicate(VM_Version::supports_avx512vl());
>>>>>
>>>>>        +  match(Set dst (MaskCreateI src));
>>>>>
>>>>>        +  effect(TEMP dst);
>>>>>
>>>>>        +  format %{ "createmsk   $dst, $src" %}
>>>>>
>>>>>        +  ins_encode %{
>>>>>
>>>>>        +    __ createmsk($dst$$Register, $src$$Register);
>>>>>
>>>>>        +  %}
>>>>>
>>>>>        but:
>>>>>
>>>>>        +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*)
>>>>> const {
>>>>>
>>>>>        +    MacroAssembler _masm(&cbuf);
>>>>>
>>>>>        +    __ restoremsk();
>>>>>
>>>>>        +  }
>>>>>
>>>>>        The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>>        The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>>        The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>>        The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>>
>>>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>>>
>>>>>            Thanks,
>>>>>            Michael
>>>>

From martin.doerr at sap.com  Mon Apr 18 07:31:36 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 18 Apr 2016 07:31:36 +0000
Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
In-Reply-To: <57114147.5060206@oracle.com>
References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap>
	<57020636.7010806@oracle.com>
	<8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap>
	<5704C48C.2070502@oracle.com>
	<b5d4578b1b6c47d1a57421cbafec5860@DEWDFE13DE14.global.corp.sap>
	<5704F8DA.9030000@oracle.com>
	<a390742c041b4803b1b0ebe63e7a3598@DEWDFE13DE14.global.corp.sap>
	<57061D3E.8050408@oracle.com>
	<f04877e6fb2d47e0a01b9e92ebf4b07a@DEWDFE13DE14.global.corp.sap>
	<570632FF.7090103@redhat.com>
	<805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap>
	<63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap>
	<5710478C.8050200@oracle.com> <57109BE2.1090602@oracle.com>
	<57114147.5060206@oracle.com>
Message-ID: <4e54bacd93284faf9843b60f98314de5@DEWDFE13DE14.global.corp.sap>

Thanks everybody for the discussion, for reviewing and for sponsoring.

Best regards,
Martin

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Freitag, 15. April 2016 21:30
To: Jamsheed C m <jamsheed.c.m at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe

Thank you, Jamsheed. Testing results looks fine so far. I am pushing it.

Thanks,
Vladimir

On 4/15/16 12:44 AM, Jamsheed C m wrote:
> Hi Vladimir,
>
> PIT testing is in progress, link is available  in bug report.
>
> Best Regards,
> Jamsheed
>
> On 4/15/2016 7:14 AM, Vladimir Kozlov wrote:
>> Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/12/16 2:45 AM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> I think we have come to a common understanding and there was no complaint about my latest webrev:
>>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/
>>>
>>> Can I consider it reviewed?
>>> Can somebody sponsor, please?
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin
>>> Sent: Donnerstag, 7. April 2016 12:52
>>> To: Andrew Haley <aph at redhat.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
>>>
>>> Hi Andrew, Jamsheed and all,
>>>
>>> thank you very much for your input.
>>>
>>> As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count().
>>> Therefore, I have replaced the storestore barrier introduced with JDK-8143897  (even though this barrier was also
>>> correct).
>>>
>>> My change still contains a releasing store for newly created ExceptionCache instances.
>>> As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce
>>> additional false negatives on weak memory model platforms.
>>> I think having the release doesn't hurt too much and makes the design a little cleaner.
>>>
>>> I also added comments based on your input.
>>>
>>> The new webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/
>>>
>>> Please review. I will also need a sponsor from Oracle, please.
>>>
>>> Thanks again and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Andrew Haley [mailto:aph at redhat.com]
>>> Sent: Donnerstag, 7. April 2016 12:14
>>> To: Doerr, Martin <martin.doerr at sap.com>; Jamsheed C m <jamsheed.c.m at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe
>>>
>>> On 07/04/16 10:08, Doerr, Martin wrote:
>>>
>>>> atomic update for the _count would only be required if there were
>>>> multiply threads which attempt to increment it
>>>> concurrently. However, updates are under lock, so we only have
>>>> concurrent readers which is ok.
>>>>
>>>> I still think "volatile" does what we need here. Especially the xlC
>>>> compiler on AIX tends to reload variables from memory. Exactly this
>>>> can be prevented by making the field volatile.
>>>
>>> I think your latest patch is OK.  Whether volatile is really good
>>> enough, I don't know.  The new(ish) C++ memory model treats this as a
>>> race, and therefore undefined behaviour.  Old C++ didn't have a memory
>>> model, so the best we can do with racy code is guess about what our
>>> compilers might do.
>>>
>>> I certainly much prefer a release_store to the storestore fence used
>>> in the fix for 8143897.
>>>
>>> Andrew.
>>>
>

From zoltan.majo at oracle.com  Mon Apr 18 07:36:40 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Mon, 18 Apr 2016 09:36:40 +0200
Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if
	on-stack-replacement is enabled
In-Reply-To: <57113950.9070700@oracle.com>
References: <571107CD.8070205@oracle.com> <57113950.9070700@oracle.com>
Message-ID: <57148E88.9010905@oracle.com>

Thank you, Vladimir, for the review!

Best regards,


Zoltan

On 04/15/2016 08:56 PM, Vladimir Kozlov wrote:
> Looks good.
>
> Thanks,
> Vladimir
>
> On 4/15/16 8:25 AM, Zolt?n Maj? wrote:
>> Hi,
>>
>>
>> please review the patch for 8072428.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8072428
>>
>> Problem: On-stack-replacement requires loop counters; disabling loop 
>> counters with on-stack-replacement enabled triggers
>> an assert.
>>
>> Solution: Set UseLoopCounter ergonomically if on-stack-replacement is 
>> enabled. Print warning.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/
>>
>> Tested with locally-built VM (linux_x64).
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>


From edward.nevill at gmail.com  Mon Apr 18 08:10:51 2016
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 18 Apr 2016 09:10:51 +0100
Subject: aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
Message-ID: <1460967051.10749.31.camel@mint>

On Fri, 2016-04-15 at 20:45 +0800, Long Chen wrote:
> Hi
>  
> Please review this patch making use of DC ZVA to do block zeroing.
>  
> http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.patch
>  
> I?m sorry that I can?t produce a test case matching the ?clear_array? pattern showing obvious improvement. However, generating ?DC ZVA? should be the right thing to do as it usually has better cache behaviors. Besides, gcc and linux?s memset have been using ?DC ZVA?.
>  

Hi Long,

Thanks for this. I have benchmarked this on 3 different partners HW using the following JMH test case

http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java

On two partners HW I see a significant improvement. On one partners HW I see almost identical performance.

Here are the results I get with the original normalised to 100 sec to avoid disclosing any absolute performance figures.

Partner A, Original = 100 sec, revised = 100.7 sec
Partner B, Original = 100 sec, revised = 97.6 sec
Partner C, Original = 100 sec, revised = 91.2 sec

One small improvement might be to above using a tmp register which has to be allocated here

-instruct clearArray_imm_reg(immL cnt, iRegP base, Universe dummy, rFlagsReg cr)
+instruct clearArray_imm_reg(immL cnt, iRegP base, iRegLNoSp tmp, Universe dummy, rFlagsReg cr)

-    __ zero_words($base$$Register, (u_int64_t)$cnt$$constant);
+    __ zero_words($base$$Register, (u_int64_t)$cnt$$constant, $tmp$$Register);

by using 'lr' as the tmp register here

+  } else if (UseBlockZeroing && cnt >= (u_int64_t)(BlockZeroingLowLimit >> LogBytesPerWord)) {
+    mov(tmp, cnt);
+    zero_words(base, tmp, true);

AFAIK, 'lr' is always available as a tmp register in C2 generated code.

All the best,
Ed.


From zoltan.majo at oracle.com  Mon Apr 18 09:22:38 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Mon, 18 Apr 2016 11:22:38 +0200
Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after
	eliminating phi with unique input
In-Reply-To: <5711248C.7000503@oracle.com>
References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com>
	<571104A9.7060208@oracle.com> <5711248C.7000503@oracle.com>
Message-ID: <5714A75E.8050300@oracle.com>

Hi Vladimir,


On 04/15/2016 07:27 PM, Vladimir Kozlov wrote:
> Looks good to me.

thank you for the review!

Best regards,


Zoltan

>
> thanks,
> Vladimir
>
> On 4/15/16 8:11 AM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> thank you for the feedback!
>>
>> On 04/15/2016 02:46 AM, Vladimir Kozlov wrote:
>>> I think check should use !isa_oopptr() since one of nodes could be 
>>> ConP NULL ptr which is not klassptr.
>>
>> Here is the updated webrev:
>> http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/
>>
>> RBT testing passes. I did ~70 runs with the reproducer, no problems 
>> have shown up so far. I'll do ~900 more runs, though.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/14/16 6:21 AM, Zolt?n Maj? wrote:
>>>> Hi,
>>>>
>>>>
>>>> please review the patch for 8153357.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8153357
>>>>
>>>> Problem: When determining the unique input of a phi, the C2 
>>>> compiler removes cast nodes connecting the phi to its
>>>> unique input.
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 
>>>>
>>>>
>>>> Then (if the phi has indeed a unique input), the C2 compiler 
>>>> attempts replace the phi with a cast node. The new cast
>>>> node feeds from the unique input.
>>>>
>>>> To be able to remove the phi node, the C2 compiler must  to 
>>>> determine the type of cast to add in place of the phi
>>>> node (CastII, CastPP, or CheckCastPP).
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 
>>>>
>>>>
>>>> The failure in the bug report appears because the C2 compiler adds 
>>>> a cast node of unexpected type to the graph (a
>>>> CheckCastPP instead of a CastPP when casting between two klass 
>>>> pointers).
>>>>
>>>> Please find more details about the cause of the failure in the bug 
>>>> description:
>>>> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 
>>>>
>>>>
>>>>
>>>>
>>>> Solution: Refine C2's logic to determine the type of cast node added.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/
>>>>
>>>> Testing:
>>>> - JPRT;
>>>> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp);
>>>> - 500 non-failing runs with the reproducer (the problem reproduces 
>>>> with < 100 runs).
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>


From nils.eliasson at oracle.com  Mon Apr 18 09:39:23 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 18 Apr 2016 11:39:23 +0200
Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile
	before compilebroker init"
In-Reply-To: <5711215D.4060202@oracle.com>
References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com>
	<5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com>
	<5710CF13.5090404@oracle.com> <5711215D.4060202@oracle.com>
Message-ID: <5714AB4B.9070200@oracle.com>

Thank you Vladimir,

I have verified the test executes in JPRT.

Regards,
Nils

On 2016-04-15 19:14, Vladimir Kozlov wrote:
> Looks good. Make sure the test is executed in JPRT.
>
> Thanks,
> Vladimir
>
> On 4/15/16 4:22 AM, Nils Eliasson wrote:
>> Hi Tobias,
>>
>> Thanks for your feedback!
>>
>> New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03
>>
>> Regards,
>> Nils
>>
>> On 2016-04-15 13:15, Tobias Hartmann wrote:
>>> Hi Nils,
>>>
>>> On 15.04.2016 11:39, Nils Eliasson wrote:
>>>> Thanks Vladimir!
>>>> On 2016-04-15 01:41, Vladimir Kozlov wrote:
>>>>> I agree with this simple change as the fix.
>>>>> Note, -Xcomp does not switch off Interpreter (we can run without 
>>>>> Interpreter). We use !UseInterpreter as indication
>>>>> if Xcomp was used.
>>>>> I don't see a PIT link in the bug report.
>>>> There was none, Tobias found this regression testing something else.
>>>>
>>>> Now I have added a regression test: 
>>>> hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/
>>> Please set the test copyright date to 2016. I would maybe also 
>>> change the test summary to what you wrote in line 30
>>> ("Sanity test flag combo..") because this has nothing to do without 
>>> support for blocking compiles.
>>>
>>> Otherwise looks good to me.
>>>
>>> Best regards,
>>> Tobias
>>>
>>>> Regards,
>>>> Nils
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/14/16 6:17 AM, Nils Eliasson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Please review this fix.
>>>>>>
>>>>>> Summary:
>>>>>> In JDK-8150646 I added an assert in compile_method that the 
>>>>>> compiler must not be NULL. Before there was a return
>>>>>> there that just ignored the compile.
>>>>>>
>>>>>> Running the VM with the flag combination -Xcomp and 
>>>>>> -XX:TieredStopAtLevel=0 creates a special situation:
>>>>>> UseInterpreter is set to false (but the interpreter it is still 
>>>>>> available) and then some
>>>>>> essential methods are forced to be compiled, but the initial 
>>>>>> complevel becomes 0 and hits the assert in compileBroker.
>>>>>>
>>>>>> Solution:
>>>>>> We could discuss if it should be allowed to submit compiles on 
>>>>>> level 0, a change that would become a bit larger.
>>>>>> This time I choose to extend the _initalized check in 
>>>>>> compile_method. I didn't add any
>>>>>> logging or warning because this is really a corner case.
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151
>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/
>>>>>> (Ignore the extra tags in the webrev)
>>>>>>
>>>>>> Best regards,
>>>>>> Nils Eliasson
>>


From nils.eliasson at oracle.com  Mon Apr 18 10:24:11 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 18 Apr 2016 12:24:11 +0200
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <B3A4C7A2-E9B6-403A-BF0A-8B64049F4338@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
	<CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>
	<5710E772.5050801@oracle.com>
	<B3A4C7A2-E9B6-403A-BF0A-8B64049F4338@oracle.com>
Message-ID: <5714B5CB.70705@oracle.com>

Hi,

On 2016-04-15 22:43, Christian Thalinger wrote:
>
>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson <nils.eliasson at oracle.com 
>> <mailto:nils.eliasson at oracle.com>> wrote:
>>
>> Hi,
>>
>> On 2016-04-14 20:45, Christian Thalinger wrote:
>>>
>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson 
>>>> <nils.eliasson at oracle.com> wrote:
>>>>
>>>> I moved the reasons to CompileTask.hpp and put it together with the 
>>>> names list. Also changed the type from int to CompileReason as Igor 
>>>> suggested.
>>>>
>>>> It gets verbose in the method declarations in compileBroker
>>>
>>> Don?t worry about this.
>>>
>>>> and sometimes I think CompileReason should be declared in 
>>>> CompileBroker because it is mostly used there. On the other hand, 
>>>> CompileTask is the keeper of the CompileReason so it makes sense too.
>>>
>>> Yes, that?s the right place.
>>>
>>>>
>>>> New webrev:
>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ 
>>>> <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.03/>
>>>
>>> *+ bool can_become_stale() const {*
>>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);*
>>> *+ }*
>>> I?m not a fan of implicit contracts just defined by comments.  This 
>>> method doesn?t seem to be performance critical so I would suggest to 
>>> use a switch-case.  An attribute on the enum would be much better 
>>> but we all know this isn?t Java.
>>
>> As you suggested:
>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04
>
> Thanks.  A space is missing and the closing } indent is wrong:
> *+ bool can_become_stale() const {*
> *+ switch(_compile_reason) {*
> *+ case Reason_BackedgeCount:*
> *+ case Reason_InvocationCount:*
> *+ case Reason_Tiered:*
> *+ return !_is_blocking;*
> *+ }*
> *+ return false;*
> *+ }*
> Also, what about:
> *+ Reason_None,*
> *+ Reason_CTW, // Compile the world*
> *+ Reason_Replay, // ciReplay*
> These were covered before.
Reason_None - is only used for bounds checking together with Reason_Count.
Reason_Replay - if these compilations can get stale we can get 
indeterminism in replay.
Reason_CTW - CTW could silently drop compiles -> more indeterminism.

Regards,
Nils

>
>>
>> Also made reasons CTW and Replay not stale-able.
>>
>> Thanks!
>> Nils
>>
>>>
>>>>
>>>> Thanks!
>>>> Nils
>>>>
>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>>>>> Very nice, I like it.
>>>>>
>>>>> One note. CompileReason (and its names) should be CompileTask 
>>>>> class where it is recorded. Then CompileTask::can_become_stale() 
>>>>> can be in header file so it is inlinined on all platforms.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> New webrev:
>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ 
>>>>>> <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.02/>
>>>>>>
>>>>>> Summary
>>>>>> Introduced an enum CompileReason with members matching all the old
>>>>>> variants, and a table containing all the unchanged strings. I see the
>>>>>> possibility of removing/changing/simplifying some CompileReasons but
>>>>>> have choosen not to do so in this change.
>>>>>>
>>>>>> Only new logic is the CompileTask::can_become_stale() method.
>>>>>>
>>>>>> Testing:
>>>>>> Running Testset hotspot on all platforms and hotspot_all on one 
>>>>>> platform
>>>>>>
>>>>>> Regards,
>>>>>> Nils Eliawsson
>>>>>>
>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>>>>> Tasks get evicted from the compile_queue if their invocation 
>>>>>>>> counter
>>>>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>>>>>
>>>>>>>> I'll do a proper fix, it is the right thing to do and should be 
>>>>>>>> pretty
>>>>>>>> quick. I'll change the comment to an enum that represent who 
>>>>>>>> submitted
>>>>>>>> the compile, and add a table for the comments. This could be 
>>>>>>>> useful in
>>>>>>>> other settings to.
>>>>>>>
>>>>>>> Sounds good.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Nils
>>>>>>>>
>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>>>>> What do you mean "stale"?
>>>>>>>>> I would prefer to see the real fix as you suggested to avoid 
>>>>>>>>> removing
>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>>>>>
>>>>>>>>>> Summary:
>>>>>>>>>> Add method enqueued for compilation with WB API may be 
>>>>>>>>>> removed from
>>>>>>>>>> the compile queue as stale.
>>>>>>>>>>
>>>>>>>>>> Solution:
>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>>>>> stale while the test is running. (Also added some extra
>>>>>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>>>>>
>>>>>>>>>> This is an workaround but we should consider fixing something
>>>>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>>>>> task with info about the origin of the compile. The comment 
>>>>>>>>>> field has
>>>>>>>>>> this information - but then it needs to be
>>>>>>>>>> converted to an enum.
>>>>>>>>>>
>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Nils Eliasson
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/c287b594/attachment-0001.html>

From ENOMIKI at jp.ibm.com  Mon Apr 18 10:36:39 2016
From: ENOMIKI at jp.ibm.com (Miki M Enoki)
Date: Mon, 18 Apr 2016 10:36:39 +0000
Subject: PPC64 VMX/VSX array copy stubs
Message-ID: <201604181036.u3IAarDQ018409@d19av08.sagamino.japan.ibm.com>

An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/43a45ef6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ArrayCopyTest1.java
Type: application/octet-stream
Size: 3177 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/43a45ef6/ArrayCopyTest1-0001.java>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64le_vmx.diff
Type: application/octet-stream
Size: 7923 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/43a45ef6/ppc64le_vmx-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64le_vsx.diff
Type: application/octet-stream
Size: 7001 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/43a45ef6/ppc64le_vsx-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: result.jpg
Type: image/jpeg
Size: 32237 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/43a45ef6/result-0001.jpg>

From nils.eliasson at oracle.com  Mon Apr 18 11:24:00 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 18 Apr 2016 13:24:00 +0200
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
In-Reply-To: <57112880.1010204@oracle.com>
References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com>
Message-ID: <5714C3D0.2070804@oracle.com>

Resizeable is better, but then we assert on expanding the stringbuffer 
while being under a different ResourceMark.

Regards,
Nils

On 2016-04-15 19:44, Vladimir Kozlov wrote:
> Use resizable stream:
>
> stringStream(size_t initial_bufsize = 256);
>
> 1024 may not be enough.
>
> Thanks,
> Vladimir
>
> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>> Hi,
>>
>> Please review this fix of print opto_assembly.
>>
>> Summary:
>> The compilelog can get corrupted and the VM may assert on "failed: 
>> bad tag in log".
>>
>> When printing assembly in output.cpp we first take the ttylock, print 
>> the head and then the method metadata. However the
>> metadata printing makes a vm entry and may block for a safepoint and 
>> will then release the lock
>> (break_tty_lock_for_safepoint). After that some of the other compiler 
>> thread that haven't safepointed will take the lock
>> and the broken log will be a fact when the safepoint is over and the 
>> first thread starts logging again.
>>
>> Solution:
>> Print the method metadata to a temporary buffer, then take the tty lock.
>>
>> Testing:
>> Repro from bug stops failing.
>> Running :hotspot_all 
>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854)
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>
>> Regards,
>> Nils Eliasson


From aph at redhat.com  Mon Apr 18 11:45:10 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 18 Apr 2016 12:45:10 +0100
Subject: aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <1460967051.10749.31.camel@mint>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<1460967051.10749.31.camel@mint>
Message-ID: <5714C8C6.3030302@redhat.com>

On 04/18/2016 09:10 AM, Edward Nevill wrote:
> I have benchmarked this on 3 different partners HW using the following JMH test case
> 
> http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java

This isn't a great test for block zeroing.  Nevertheless, I have
approved this patch, with a few alterations.

A note about using jmh, not just for you but for everyone working on
this project.  Don't do something like

for (int i = 0; i < 10000; i++)
    theTest();

as an attempt to pad out the execution time.  jmh is much better at
this sort of thing than you are.  Just put the code you're trying to
test in the test case.

Andrew.

From aph at redhat.com  Mon Apr 18 12:55:12 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 18 Apr 2016 13:55:12 +0100
Subject: aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
Message-ID: <5714D930.4090804@redhat.com>

One other thing.  This is rather a lot of code to emit every time an
array is created:

 ;; zero_words {
  0x0000007fa880f5f0: cmp	x11, #0x20
  0x0000007fa880f5f4: b.lt	0x0000007fa880f62c

  0x0000007fa880f5f8: neg	x8, x10
  0x0000007fa880f5fc: and	x8, x8, #0x7f
  0x0000007fa880f600: cbz	x8, 0x0000007fa880f614
  0x0000007fa880f604: sub	x11, x11, x8, asr #3
  0x0000007fa880f608: sub	x8, x8, #0x8
  0x0000007fa880f60c: str	xzr, [x10],#8
  0x0000007fa880f610: cbnz	x8, 0x0000007fa880f608
  0x0000007fa880f614: sub	x11, x11, #0x10
  0x0000007fa880f618: dc	zva, x10
  0x0000007fa880f61c: subs	x11, x11, #0x10
  0x0000007fa880f620: add	x10, x10, #0x80
  0x0000007fa880f624: b.ge	0x0000007fa880f618
  0x0000007fa880f628: add	x11, x11, #0x10

  0x0000007fa880f62c: and	x8, x11, #0x7

I don't think this CBZ does anything useful:

  0x0000007fa880f630: cbz	x8, 0x0000007fa880f670

(I'm assuming that the 0-7 cases are uniformly distributed.)

  0x0000007fa880f634: sub	x11, x11, x8
  0x0000007fa880f638: add	x10, x10, x8, lsl #3
  0x0000007fa880f63c: adr	x9, 0x0000007fa880f670
  0x0000007fa880f640: sub	x9, x9, x8, lsl #2
  0x0000007fa880f644: br	x9
  0x0000007fa880f648: add	x10, x10, #0x40
  0x0000007fa880f64c: sub	x11, x11, #0x8
  0x0000007fa880f650: stur	xzr, [x10,#-64]
  0x0000007fa880f654: stur	xzr, [x10,#-56]
  0x0000007fa880f658: stur	xzr, [x10,#-48]
  0x0000007fa880f65c: stur	xzr, [x10,#-40]
  0x0000007fa880f660: stur	xzr, [x10,#-32]
  0x0000007fa880f664: stur	xzr, [x10,#-24]
  0x0000007fa880f668: stur	xzr, [x10,#-16]
  0x0000007fa880f66c: stur	xzr, [x10,#-8]
  0x0000007fa880f670: cbnz	x11, 0x0000007fa880f648
 ;; } zero_words

We could think about moving the large block case into a stub which is
emitted after the main body of the method, or even into a shared stub.
A shared stub would require the args to be in fixed registers, though.

Andrew.


From ENOMIKI at jp.ibm.com  Sun Apr 17 18:28:01 2016
From: ENOMIKI at jp.ibm.com (Miki M Enoki)
Date: Sun, 17 Apr 2016 18:28:01 +0000
Subject: PPC64 VMX/VSX array copy stubs 
Message-ID: <201604171828.u3HIS9u0012295@d19av06.sagamino.japan.ibm.com>

Dear all,

Could you please review the following change?
I created two patches for generate_disjoint_long_copy with VMX(Vector 
Multimedia Extension) and VSX(Vector-Scalar Extension).

Let me share our performance results. 
I changed array copy size with aligned (= src and dst alignments match) 
and unaligned.
It means that I measured performance with the following four patterns at a 
time. Long array is 8 byte alignment, so these patterns will cover align 
and unaligned case.
        System.arraycopy(src, 0, dst, 0, size);
        System.arraycopy(src, 0, dst, 1, size);
        System.arraycopy(src, 1, dst, 0, size);
        System.arraycopy(src, 1, dst, 1, size);

VMX(max), VSX(max) are aligned score, while VMX(min),VSX(min) are 
unaligned score. Scalar is original OpenJDK.
VSX got better performance when array size is less than about 2048 byte, 
but VSX(min) got worse than VMX in large array size.
It would be overhead of the alignment in VSX.


Server:  8247-22L (POWER8 (3.3GHz 12 cores) x2, 512GB memory), Ubuntu 
Linux 15.04 ppc64LE (kernel: 3.19.0-18-generic), 
OpenJDK (build based on 1.9), JVMARGS: ?-Xmx40g ?Xms40g -Xmn20g"


Here are benchmark code and patch files.


In the VMX, it is implemented for ppc LE only now. (generated with "hg 
diff -g" under the latest hotspot directory.)
 

Related links:
"8154156: PPC64: improve array copy stubs by using vector instructions"
https://bugs.openjdk.java.net/browse/JDK-8154156
"PPC64 VSX load/store instructions in stubs"
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-April/002419.html


Regards,
Miki

+ + + + + + +
Miki ENOKI, Ph.D.
IBM Research - Tokyo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 47807 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/attachment-0001.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64le_vmx.diff
Type: application/octet-stream
Size: 7923 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ppc64le_vmx-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64le_vsx.diff
Type: application/octet-stream
Size: 7001 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ppc64le_vsx-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ArrayCopyTest1.java
Type: application/octet-stream
Size: 3177 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ArrayCopyTest1-0001.java>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: result.jpg
Type: image/jpeg
Size: 62057 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/result-0001.jpg>

From ENOMIKI at jp.ibm.com  Mon Apr 18 07:47:45 2016
From: ENOMIKI at jp.ibm.com (Miki M Enoki)
Date: Mon, 18 Apr 2016 07:47:45 +0000
Subject: PPC64 VMX/VSX array copy stubs
Message-ID: <201604180747.u3I7lwYA004525@d19av08.sagamino.japan.ibm.com>

An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/b801c5bf/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1460965465853.jpg
Type: image/jpeg
Size: 62057 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/b801c5bf/Image.1460965465853-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: result.jpg
Type: image/jpeg
Size: 62057 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/b801c5bf/result-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64le_vmx.diff
Type: application/octet-stream
Size: 7923 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/b801c5bf/ppc64le_vmx-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64le_vsx.diff
Type: application/octet-stream
Size: 7001 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/b801c5bf/ppc64le_vsx-0001.diff>

From ENOMIKI at jp.ibm.com  Mon Apr 18 10:19:27 2016
From: ENOMIKI at jp.ibm.com (Miki M Enoki)
Date: Mon, 18 Apr 2016 10:19:27 +0000
Subject: PPC64 VMX/VSX array copy stubs
Message-ID: <201604181019.u3IAJh3E008402@d19av08.sagamino.japan.ibm.com>

An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/5e0f6f77/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ArrayCopyTest1.java
Type: application/octet-stream
Size: 3177 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/5e0f6f77/ArrayCopyTest1-0001.java>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64le_vmx.diff
Type: application/octet-stream
Size: 7923 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/5e0f6f77/ppc64le_vmx-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc64le_vsx.diff
Type: application/octet-stream
Size: 7001 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/5e0f6f77/ppc64le_vsx-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: result.jpg
Type: image/jpeg
Size: 62057 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/5e0f6f77/result-0001.jpg>

From vladimir.kozlov at oracle.com  Mon Apr 18 17:30:38 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 18 Apr 2016 10:30:38 -0700
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
In-Reply-To: <5714C3D0.2070804@oracle.com>
References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com>
	<5714C3D0.2070804@oracle.com>
Message-ID: <571519BE.605@oracle.com>

tty would have the same problem but it use C_HEAP to allocate:

     defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) 
defaultStream();

Please, look if you can do something similar.

Thanks,
Vladimir

On 4/18/16 4:24 AM, Nils Eliasson wrote:
> Resizeable is better, but then we assert on expanding the stringbuffer
> while being under a different ResourceMark.
>
> Regards,
> Nils
>
> On 2016-04-15 19:44, Vladimir Kozlov wrote:
>> Use resizable stream:
>>
>> stringStream(size_t initial_bufsize = 256);
>>
>> 1024 may not be enough.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>>> Hi,
>>>
>>> Please review this fix of print opto_assembly.
>>>
>>> Summary:
>>> The compilelog can get corrupted and the VM may assert on "failed:
>>> bad tag in log".
>>>
>>> When printing assembly in output.cpp we first take the ttylock, print
>>> the head and then the method metadata. However the
>>> metadata printing makes a vm entry and may block for a safepoint and
>>> will then release the lock
>>> (break_tty_lock_for_safepoint). After that some of the other compiler
>>> thread that haven't safepointed will take the lock
>>> and the broken log will be a fact when the safepoint is over and the
>>> first thread starts logging again.
>>>
>>> Solution:
>>> Print the method metadata to a temporary buffer, then take the tty lock.
>>>
>>> Testing:
>>> Repro from bug stops failing.
>>> Running :hotspot_all
>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854)
>>>
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>>
>>> Regards,
>>> Nils Eliasson
>

From vivek.r.deshpande at intel.com  Mon Apr 18 17:38:14 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Mon, 18 Apr 2016 17:38:14 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>

Hi all

I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
This uses -XX:DisableIntrinsic option to achieve the same.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154473
webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/

Thanks and regards,
Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/f853d9d6/attachment.html>

From christian.thalinger at oracle.com  Mon Apr 18 18:25:52 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 18 Apr 2016 08:25:52 -1000
Subject: CR for RFR 8153998
In-Reply-To: <C568518E7B433348B114B6A7122D474756E631AF@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
	<57103B20.1040207@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62B2F@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E62CA1@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E631AF@FMSMSX102.amr.corp.intel.com>
Message-ID: <E6511FC8-F543-49F3-BD28-ABCD0D15DFF3@oracle.com>

Looks good.

> On Apr 15, 2016, at 6:20 PM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
> Vladimir/Christian:
> 
> I believe I have addressed all concerns in this update:
> 
> Webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ 
> 
> Regards,
> Michael
> 
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
> Sent: Friday, April 15, 2016 2:04 AM
> To: 'Vladimir Kozlov' <vladimir.kozlov at oracle.com>
> Cc: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: RE: CR for RFR 8153998
> 
> Vladimir, the code has been updated and is available at:
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ 
> 
> Thanks,
> Michael
> 
> -----Original Message-----
> From: Berg, Michael C
> Sent: Thursday, April 14, 2016 5:54 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: RE: CR for RFR 8153998
> 
> Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd.
> It will be clean when next you see the code.
> 
> -Michael
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 5:52 PM
> To: Berg, Michael C <michael.c.berg at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
> 
> On 4/14/16 5:12 PM, Berg, Michael C wrote:
>> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.
> 
> How it is sizeless when it generates kmovwl() instruction?
> Do you mean it does not have side effects (no flags modified)?
> 
> Vladimir
> 
>> 
>> Ok, I will try the pattern match method.
>> 
>> Thanks
>> -Michael
>> 
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 5:02 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger 
>> <christian.thalinger at oracle.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>> 
>> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>>> Vladimir,
>>> 
>>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>>> 
>>> I tried something like that early on with CountedLoopEnd.
>> 
>> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
>> I don't see any side effects for restoremask in your code. What are you talking about?
>> 
>> I am suggesting something like next:
>> 
>> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>>    predicate(n->has_vect_mask_set());
>>    match(CountedLoopEnd cop cr);
>>    effect(USE labl);
>> 
>>    ins_cost(400);
>>    format %{ "j$cop     $labl\t# loop end\n\t"
>>              "restoremask \t# vector mask restore for loops"
>>           %}
>>    ins_encode %{
>>      Label* L = $labl$$label;
>>      __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>>      __ restoremask();
>>    %}
>>    ins_pipe(pipe_jcc);
>> %}
>> 
>> Vladimir
>> 
>>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>>> 
>>> -Michael
>>> 
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Thursday, April 14, 2016 4:27 PM
>>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg, 
>>> Michael C <michael.c.berg at intel.com>
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: CR for RFR 8153998
>>> 
>>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>> 
>>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>> 
>>>>> Christian,
>>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>> 
>>>> That?s unfortunate but I understand.  I?m fine with it then.
>>> 
>>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>> 
>>> Vladimir
>>> 
>>>> 
>>>>> Regards,
>>>>> Michael
>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C 
>>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> *Subject:*Re: CR for RFR 8153998
>>>>> 
>>>>>      On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>      See below for context.
>>>>>      Regards,
>>>>>      Michael
>>>>>      *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>>      *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>>      *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>>      *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>      *Subject:*Re: CR for RFR 8153998
>>>>> 
>>>>>          On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>          Hi Folks,
>>>>> 
>>>>>          I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>>          This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>>          Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>>>          performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>>          This code was tested as follows(see jbs entry below):
>>>>> 
>>>>>          Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>> 
>>>>>          webrev:
>>>>>          http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>> 
>>>>> 
>>>>> +//------------------------------MachMskNode-----------------------
>>>>> +-
>>>>> +-
>>>>> ----------
>>>>> 
>>>>>      +// Machine function Msk Node
>>>>> 
>>>>>      +class MachMskNode : public MachIdealNode {
>>>>> 
>>>>>      Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>>>      <MCB> Ok, that?s easy enough.
>>>>>      Also, I don?t quite understand why we have:
>>>>> 
>>>>>      +instruct set_mask(rRegI dst, rRegI src) %{
>>>>> 
>>>>>      +  predicate(VM_Version::supports_avx512vl());
>>>>> 
>>>>>      +  match(Set dst (MaskCreateI src));
>>>>> 
>>>>>      +  effect(TEMP dst);
>>>>> 
>>>>>      +  format %{ "createmsk   $dst, $src" %}
>>>>> 
>>>>>      +  ins_encode %{
>>>>> 
>>>>>      +    __ createmsk($dst$$Register, $src$$Register);
>>>>> 
>>>>>      +  %}
>>>>> 
>>>>>      but:
>>>>> 
>>>>>      +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) 
>>>>> const {
>>>>> 
>>>>>      +    MacroAssembler _masm(&cbuf);
>>>>> 
>>>>>      +    __ restoremsk();
>>>>> 
>>>>>      +  }
>>>>> 
>>>>>      The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>>      The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>>      The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>>      The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>> 
>>>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>>> 
>>>>>          Thanks,
>>>>>          Michael
>>>> 


From vladimir.kozlov at oracle.com  Mon Apr 18 18:32:53 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 18 Apr 2016 11:32:53 -0700
Subject: CR for RFR 8153998
In-Reply-To: <E6511FC8-F543-49F3-BD28-ABCD0D15DFF3@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E62225@FMSMSX102.amr.corp.intel.com>
	<E9185B47-B908-4B29-9D1C-3BF609E5BEF9@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62447@FMSMSX102.amr.corp.intel.com>
	<B8CF6639-C4C3-4DEC-AC08-C1A757FAE694@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62850@FMSMSX102.amr.corp.intel.com>
	<90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com>
	<57102743.8080508@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62A7E@FMSMSX102.amr.corp.intel.com>
	<57102F7D.2090303@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62AFA@FMSMSX102.amr.corp.intel.com>
	<57103B20.1040207@oracle.com>
	<C568518E7B433348B114B6A7122D474756E62B2F@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E62CA1@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E631AF@FMSMSX102.amr.corp.intel.com>
	<E6511FC8-F543-49F3-BD28-ABCD0D15DFF3@oracle.com>
Message-ID: <57152855.2090106@oracle.com>

Testing looks good. I will push it after few other pushes currently in a 
queue.

Thanks,
Vladimir

On 4/18/16 11:25 AM, Christian Thalinger wrote:
> Looks good.
>
>> On Apr 15, 2016, at 6:20 PM, Berg, Michael C <michael.c.berg at intel.com> wrote:
>>
>> Vladimir/Christian:
>>
>> I believe I have addressed all concerns in this update:
>>
>> Webrev:
>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/
>>
>> Regards,
>> Michael
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
>> Sent: Friday, April 15, 2016 2:04 AM
>> To: 'Vladimir Kozlov' <vladimir.kozlov at oracle.com>
>> Cc: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
>> Subject: RE: CR for RFR 8153998
>>
>> Vladimir, the code has been updated and is available at:
>>
>> webrev:
>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Berg, Michael C
>> Sent: Thursday, April 14, 2016 5:54 PM
>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: RE: CR for RFR 8153998
>>
>> Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd.
>> It will be clean when next you see the code.
>>
>> -Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 5:52 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>>
>> On 4/14/16 5:12 PM, Berg, Michael C wrote:
>>> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.
>>
>> How it is sizeless when it generates kmovwl() instruction?
>> Do you mean it does not have side effects (no flags modified)?
>>
>> Vladimir
>>
>>>
>>> Ok, I will try the pattern match method.
>>>
>>> Thanks
>>> -Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Thursday, April 14, 2016 5:02 PM
>>> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger
>>> <christian.thalinger at oracle.com>
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: CR for RFR 8153998
>>>
>>> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>>>> Vladimir,
>>>>
>>>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good.  The mask version of the post loop is always clean when we apply the optimization.
>>>>
>>>> I tried something like that early on with CountedLoopEnd.
>>>
>>> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
>>> I don't see any side effects for restoremask in your code. What are you talking about?
>>>
>>> I am suggesting something like next:
>>>
>>> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>>>     predicate(n->has_vect_mask_set());
>>>     match(CountedLoopEnd cop cr);
>>>     effect(USE labl);
>>>
>>>     ins_cost(400);
>>>     format %{ "j$cop     $labl\t# loop end\n\t"
>>>               "restoremask \t# vector mask restore for loops"
>>>            %}
>>>     ins_encode %{
>>>       Label* L = $labl$$label;
>>>       __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>>>       __ restoremask();
>>>     %}
>>>     ins_pipe(pipe_jcc);
>>> %}
>>>
>>> Vladimir
>>>
>>>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit.  You would still have to add the side effect much like what I did.  I would be adding a flag to node when we don't need one.  What would like to do then, process via flag or how I do it now?  We would basically be doing it in the same place.
>>>>
>>>> -Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Thursday, April 14, 2016 4:27 PM
>>>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg,
>>>> Michael C <michael.c.berg at intel.com>
>>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: CR for RFR 8153998
>>>>
>>>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>>>
>>>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>>
>>>>>> Christian,
>>>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>>>
>>>>> That?s unfortunate but I understand.  I?m fine with it then.
>>>>
>>>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>>>
>>>> Vladimir
>>>>
>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C
>>>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>> *Subject:*Re: CR for RFR 8153998
>>>>>>
>>>>>>       On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>>       See below for context.
>>>>>>       Regards,
>>>>>>       Michael
>>>>>>       *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>>>       *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>>>       *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>>>       *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>>       *Subject:*Re: CR for RFR 8153998
>>>>>>
>>>>>>           On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>>           Hi Folks,
>>>>>>
>>>>>>           I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>>>           This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>>>           Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets.  It delivers up to 2x
>>>>>>           performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>>>           This code was tested as follows(see jbs entry below):
>>>>>>
>>>>>>           Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>>>
>>>>>>           webrev:
>>>>>>           http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>>>
>>>>>>
>>>>>> +//------------------------------MachMskNode-----------------------
>>>>>> +-
>>>>>> +-
>>>>>> ----------
>>>>>>
>>>>>>       +// Machine function Msk Node
>>>>>>
>>>>>>       +class MachMskNode : public MachIdealNode {
>>>>>>
>>>>>>       Does ?Msk? mean mask?  Then we should call it MachMaskNode.
>>>>>>       <MCB> Ok, that?s easy enough.
>>>>>>       Also, I don?t quite understand why we have:
>>>>>>
>>>>>>       +instruct set_mask(rRegI dst, rRegI src) %{
>>>>>>
>>>>>>       +  predicate(VM_Version::supports_avx512vl());
>>>>>>
>>>>>>       +  match(Set dst (MaskCreateI src));
>>>>>>
>>>>>>       +  effect(TEMP dst);
>>>>>>
>>>>>>       +  format %{ "createmsk   $dst, $src" %}
>>>>>>
>>>>>>       +  ins_encode %{
>>>>>>
>>>>>>       +    __ createmsk($dst$$Register, $src$$Register);
>>>>>>
>>>>>>       +  %}
>>>>>>
>>>>>>       but:
>>>>>>
>>>>>>       +  void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*)
>>>>>> const {
>>>>>>
>>>>>>       +    MacroAssembler _masm(&cbuf);
>>>>>>
>>>>>>       +    __ restoremsk();
>>>>>>
>>>>>>       +  }
>>>>>>
>>>>>>       The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>>>       The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>>>       The subsequent restore, preplaces the default value back into k1.  The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>>>       The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>>>
>>>>>> Hmm.  So, there is no way we can have a RestoreMaskINode?
>>>>>>
>>>>>>           Thanks,
>>>>>>           Michael
>>>>>
>

From christian.thalinger at oracle.com  Mon Apr 18 18:34:40 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 18 Apr 2016 08:34:40 -1000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
Message-ID: <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com>


> On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com> wrote:
> 
> Hi all
>  
> I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
> This uses -XX:DisableIntrinsic option to achieve the same.
> Could you please review and sponsor this patch.
>  
> Bug-id: 
> https://bugs.openjdk.java.net/browse/JDK-8154473 <https://bugs.openjdk.java.net/browse/JDK-8154473>
> webrev:
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ <http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/>
src/cpu/x86/vm/stubGenerator_x86_64.cpp

+      if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) {
       StubRoutines::_dpow = generate_libmPow();
-      StubRoutines::_dtan = generate_libmTan();
+      }
Was removing libmTan on purpose?

>  
> Thanks and regards,
> Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/317134ef/attachment.html>

From vivek.r.deshpande at intel.com  Mon Apr 18 18:38:52 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Mon, 18 Apr 2016 18:38:52 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com>

Hi Christian

I have added this. Just moved generate_libmTan() after sin and cos generation.
if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) {
  StubRoutines::_dtan = generate_libmTan();
}

Regards,
Vivek

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Monday, April 18, 2016 11:35 AM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics


On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com<mailto:vivek.r.deshpande at intel.com>> wrote:

Hi all

I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
This uses -XX:DisableIntrinsic option to achieve the same.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154473
webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/

src/cpu/x86/vm/stubGenerator_x86_64.cpp

+      if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) {

       StubRoutines::_dpow = generate_libmPow();

-      StubRoutines::_dtan = generate_libmTan();

+      }
Was removing libmTan on purpose?


Thanks and regards,
Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/a1c534b6/attachment.html>

From christian.thalinger at oracle.com  Mon Apr 18 19:15:06 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 18 Apr 2016 09:15:06 -1000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com>
Message-ID: <A58DA55D-CF48-4F0B-B04E-6F229D9A5F92@oracle.com>


> On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com> wrote:
> 
> Hi Christian
>  
> I have added this. Just moved generate_libmTan() after sin and cos generation.
> if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) {
>   StubRoutines::_dtan = generate_libmTan();
> }
>  

Sorry, I missed this.  I should have used the browser?s search instead of eyeballing it.
src/cpu/x86/vm/macroAssembler_x86.cpp

fp_runtime_fallback is unused now:

cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/
hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp
5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) {
5828:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use);
5833:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use);
5838:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use);

hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp
995:  void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use);

src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp

-      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)));
+      mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp));
I understand what it?s doing but we are calling the same methods as before.  What has changed?

> Regards, <>
> Vivek
>  
>  <>From: Christian Thalinger [mailto:christian.thalinger at oracle.com] 
> Sent: Monday, April 18, 2016 11:35 AM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics
>  
>  
> On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>> wrote:
>  
> Hi all
>  
> I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
> This uses -XX:DisableIntrinsic option to achieve the same.
> Could you please review and sponsor this patch.
>  
> Bug-id: 
> https://bugs.openjdk.java.net/browse/JDK-8154473 <https://bugs.openjdk.java.net/browse/JDK-8154473>
> webrev:
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ <http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/>
>  
> src/cpu/x86/vm/stubGenerator_x86_64.cpp
> 
> +      if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) {
>        StubRoutines::_dpow = generate_libmPow();
> -      StubRoutines::_dtan = generate_libmTan();
> +      }
> Was removing libmTan on purpose?
> 
> 
>  
> Thanks and regards,
> Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/564a49b7/attachment-0001.html>

From vivek.r.deshpande at intel.com  Mon Apr 18 20:28:08 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Mon, 18 Apr 2016 20:28:08 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <A58DA55D-CF48-4F0B-B04E-6F229D9A5F92@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com>
	<A58DA55D-CF48-4F0B-B04E-6F229D9A5F92@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com>

Hi Christian

Just calling SharedRuntime function kills the address in memory where there is jump (shown below) after the routine finishes and also need to make sure stack pointer is 16 byte aligned. So calling mathfunc() to take care of that instead of fp_runtime_fallback() which has extra overhead of storing/ restoring all the registers and xmm registers.

444   __ pop(rax);
445   __ mov(rsp, r13);
446   __ jmp(rax);

Regards,
Vivek

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Monday, April 18, 2016 12:15 PM
To: Deshpande, Vivek R
Cc: Vladimir Kozlov; hotspot compiler
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics


On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com<mailto:vivek.r.deshpande at intel.com>> wrote:

Hi Christian

I have added this. Just moved generate_libmTan() after sin and cos generation.
if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) {
  StubRoutines::_dtan = generate_libmTan();
}


Sorry, I missed this.  I should have used the browser?s search instead of eyeballing it.
src/cpu/x86/vm/macroAssembler_x86.cpp
fp_runtime_fallback is unused now:

cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/
hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp
5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) {
5828:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use);
5833:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use);
5838:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use);

hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp
995:  void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use);


src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp

-      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)));

+      mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp));
I understand what it?s doing but we are calling the same methods as before.  What has changed?


Regards,
Vivek

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Monday, April 18, 2016 11:35 AM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com<mailto:vivek.r.deshpande at intel.com>>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov <vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics


On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com<mailto:vivek.r.deshpande at intel.com>> wrote:

Hi all

I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
This uses -XX:DisableIntrinsic option to achieve the same.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154473
webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/

src/cpu/x86/vm/stubGenerator_x86_64.cpp

+      if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) {

       StubRoutines::_dpow = generate_libmPow();

-      StubRoutines::_dtan = generate_libmTan();

+      }
Was removing libmTan on purpose?


Thanks and regards,
Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/764c050e/attachment-0001.html>

From jan.civlin at intel.com  Mon Apr 18 21:41:17 2016
From: jan.civlin at intel.com (Civlin, Jan)
Date: Mon, 18 Apr 2016 21:41:17 +0000
Subject: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)
Message-ID: <39F83597C33E5F408096702907E6C4500F16BF0A@ORSMSX104.amr.corp.intel.com>

We would like to contribute the SHA256 AVX2 intrinsic.

This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.

The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.  

Contributor: Jan Civlin.


bug: https://bugs.openjdk.java.net/browse/JDK-8154495
webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/

From jan.civlin at intel.com  Mon Apr 18 21:44:26 2016
From: jan.civlin at intel.com (Civlin, Jan)
Date: Mon, 18 Apr 2016 21:44:26 +0000
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)
Message-ID: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>

== Correction in the subject line ===

We would like to contribute the SHA256 AVX2 intrinsic.

This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.

The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.  

Contributor: Jan Civlin.


bug: https://bugs.openjdk.java.net/browse/JDK-8154495
webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/

From vladimir.kozlov at oracle.com  Tue Apr 19 00:09:10 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 18 Apr 2016 17:09:10 -0700
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
Message-ID: <57157726.4030701@oracle.com>

Hi Jan,

The patch was generated on Windows and have ^M at the end of lines so I 
can't apply it to our sources.

I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.

Please, move new code in macroAssembler_x86_sha.cpp to the end of file.

_k256_W[] is the same as _k256[] with repeated 4 values. I would suggest 
to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:

StubRoutines::x86::_k256_W_adr = generate_k256_W();

What testing was done? Did you ran with fastdebug build? I am concern 
about size of new stub and current code_size2 is enough.

Thanks,
Vladimir

On 4/18/16 2:44 PM, Civlin, Jan wrote:
> == Correction in the subject line ===
>
> We would like to contribute the SHA256 AVX2 intrinsic.
>
> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>
> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>
> Contributor: Jan Civlin.
>
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>

From christian.thalinger at oracle.com  Tue Apr 19 04:33:13 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 18 Apr 2016 18:33:13 -1000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com>
	<A58DA55D-CF48-4F0B-B04E-6F229D9A5F92@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com>
Message-ID: <C3E73812-7FDE-4910-B531-C04D5A3BC366@oracle.com>


> On Apr 18, 2016, at 10:28 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com> wrote:
> 
> Hi Christian
>  
> Just calling SharedRuntime function kills the address in memory where there is jump (shown below) after the routine finishes and also need to make sure stack pointer is 16 byte aligned. So calling mathfunc() to take care of that instead of fp_runtime_fallback() which has extra overhead of storing/ restoring all the registers and xmm registers.
>  
> 444   __ pop(rax);
> 445   __ mov(rsp, r13);
> 446   __ jmp(rax);
>  

Ok, that makes sense.

> Regards,
> Vivek
>  
> From: Christian Thalinger [mailto:christian.thalinger at oracle.com] 
> Sent: Monday, April 18, 2016 12:15 PM
> To: Deshpande, Vivek R
> Cc: Vladimir Kozlov; hotspot compiler
> Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics
>  
>  
> On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>> wrote:
>  
> Hi Christian
>  
> I have added this. Just moved generate_libmTan() after sin and cos generation.
> if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) {
>   StubRoutines::_dtan = generate_libmTan();
> }
>  
>  
> Sorry, I missed this.  I should have used the browser?s search instead of eyeballing it.
> src/cpu/x86/vm/macroAssembler_x86.cpp
> 
> fp_runtime_fallback is unused now:
>  
> cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp
> 5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) {
> 5828:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use);
> 5833:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use);
> 5838:      fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use);
>  
> hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp
> 995:  void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use);
> 
> 
> src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp
> 
> -      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)));
> +      mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp));
> I understand what it?s doing but we are calling the same methods as before.  What has changed?
> 
> 
> Regards, <>
> Vivek
>  
>  <>From: Christian Thalinger [mailto:christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>] 
> Sent: Monday, April 18, 2016 11:35 AM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>; Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>
> Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics
>  
>  
> On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>> wrote:
>  
> Hi all
>  
> I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
> This uses -XX:DisableIntrinsic option to achieve the same.
> Could you please review and sponsor this patch.
>  
> Bug-id: 
> https://bugs.openjdk.java.net/browse/JDK-8154473 <https://bugs.openjdk.java.net/browse/JDK-8154473>
> webrev:
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ <http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/>
>  
> src/cpu/x86/vm/stubGenerator_x86_64.cpp
> 
> +      if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) {
>        StubRoutines::_dpow = generate_libmPow();
> -      StubRoutines::_dtan = generate_libmTan();
> +      }
> Was removing libmTan on purpose?
> 
> 
> 
>  
> Thanks and regards,
> Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160418/c98f1cd3/attachment-0001.html>

From volker.simonis at gmail.com  Tue Apr 19 08:46:45 2016
From: volker.simonis at gmail.com (Volker Simonis)
Date: Tue, 19 Apr 2016 10:46:45 +0200
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
Message-ID: <CA+3eh10C6yhjZWDhjczVwb6oNsM0-sT9ErspN5pyYpxuh0YyfA@mail.gmail.com>

Hi Vivek,

you introduce the new method TemplateInterpreterGenerator::mathfunc()
but only implement it on x86_64. Shouldn't we have at least empty
implementations of this method for all architectures?

Also the description in the bug sounds quite general but you only seem
to implement it for certain math-intrinsics on x64.

Another minor nit: in vmSymbols.hpp I don't think we need the const
qualifier on the ID argument because it is only an enum anyway:

+  static bool is_disabled_by_flags(const vmIntrinsics::ID id);

It makes sense on:

static bool is_disabled_by_flags(const methodHandle& method);

because here we are passing method by reference and the const
qualifier guaranties that is_disabled_by_flags will not change the
method.

Regards,
Volker


On Mon, Apr 18, 2016 at 7:38 PM, Deshpande, Vivek R
<vivek.r.deshpande at intel.com> wrote:
> Hi all
>
>
>
> I would like to contribute a patch which helps to control the intrinsics in
> interpreter, c1 and c2 by disabling the stub generation.
>
> This uses -XX:DisableIntrinsic option to achieve the same.
>
> Could you please review and sponsor this patch.
>
>
>
> Bug-id:
>
> https://bugs.openjdk.java.net/browse/JDK-8154473
> webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/
>
>
>
> Thanks and regards,
>
> Vivek
>
>

From rwestrel at redhat.com  Tue Apr 19 11:44:35 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 19 Apr 2016 13:44:35 +0200
Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never
	emitted
Message-ID: <57161A23.3050807@redhat.com>

(src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src <<
-shift) with src an int have some support in the aarch64.ad ad file:
rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken
and never match any ideal graph subtree.

http://cr.openjdk.java.net/~roland/8154537/webrev.00/

Roland.

From tobias.hartmann at oracle.com  Tue Apr 19 12:35:43 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 19 Apr 2016 14:35:43 +0200
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
Message-ID: <5716261F.1070205@oracle.com>

Hi,

please review the following enhancement:
https://bugs.openjdk.java.net/browse/JDK-6941938

MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().

I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.

We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().

Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.

I evaluated the following three versions of the patch.

-- Basic --
http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png

I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.

There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
Version "small" tries to improve this.

-- Prefetching --
http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png

However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.

-- Small --
http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").

The numbers can be found here:
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx

I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. 

What do you think?

Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.

Thanks,
Tobias

[1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
[2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
[3] Microbenchmark results for the "basic" implementation
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
[4] Microbenchmark results for the "prefetching" implementation
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png

From aph at redhat.com  Tue Apr 19 12:52:50 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 19 Apr 2016 13:52:50 +0100
Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are
	never emitted
In-Reply-To: <57161A23.3050807@redhat.com>
References: <57161A23.3050807@redhat.com>
Message-ID: <57162A22.2050706@redhat.com>

On 04/19/2016 12:44 PM, Roland Westrelin wrote:
> (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src <<
> -shift) with src an int have some support in the aarch64.ad ad file:
> rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken
> and never match any ideal graph subtree.
> 
> http://cr.openjdk.java.net/~roland/8154537/webrev.00/

OK, thanks.  We'll need backports for
http://hg.openjdk.java.net/aarch64-port/jdk8u/ and
http://hg.openjdk.java.net/aarch64-port/jdk7u/

These should just apply cleanly.

Andrew.

From nils.eliasson at oracle.com  Tue Apr 19 12:54:32 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 19 Apr 2016 14:54:32 +0200
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
Message-ID: <57162A88.7030608@oracle.com>

Hi Vivek,

The changes in is_intrinsic_disabled in compilerDirectives.* are static 
and only access the command line flag DisableIntrinsics. As long as 
stubs are only generated during startup and don't have a method context 
- that is ok - but it doesn't belong in the compilerDirectives-files if 
it doens't use directives.

Regards,
Nils

On 2016-04-18 19:38, Deshpande, Vivek R wrote:
>
> Hi all
>
> I would like to contribute a patch which helps to control the 
> intrinsics in interpreter, c1 and c2 by disabling the stub generation.
>
> This uses -XX:DisableIntrinsic option to achieve the same.
>
> Could you please review and sponsor this patch.
>
> Bug-id:
>
> https://bugs.openjdk.java.net/browse/JDK-8154473
> webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/
>
> Thanks and regards,
>
> Vivek
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160419/521db99b/attachment.html>

From adinn at redhat.com  Tue Apr 19 12:55:19 2016
From: adinn at redhat.com (Andrew Dinn)
Date: Tue, 19 Apr 2016 13:55:19 +0100
Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are
	never emitted
In-Reply-To: <57162A22.2050706@redhat.com>
References: <57161A23.3050807@redhat.com> <57162A22.2050706@redhat.com>
Message-ID: <57162AB7.6010602@redhat.com>

On 19/04/16 13:52, Andrew Haley wrote:
> On 04/19/2016 12:44 PM, Roland Westrelin wrote:
>> (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src <<
>> -shift) with src an int have some support in the aarch64.ad ad file:
>> rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken
>> and never match any ideal graph subtree.
>>
>> http://cr.openjdk.java.net/~roland/8154537/webrev.00/
> 
> OK, thanks.  We'll need backports for
> http://hg.openjdk.java.net/aarch64-port/jdk8u/ and
> http://hg.openjdk.java.net/aarch64-port/jdk7u/

Patch also looks good to me.

regards,


Andrew Dinn
-----------


From aph at redhat.com  Tue Apr 19 13:19:31 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 19 Apr 2016 14:19:31 +0100
Subject: aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<5714D930.4090804@redhat.com>
	<CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>
Message-ID: <57163063.3020506@redhat.com>

On 04/19/2016 01:54 PM, Long Chen wrote:
> Thanks for all these nice comments. Here is a revised version:
> 
> http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.v02.patch
> 
> 
> Changes:
> 
> 1.       Are DC and IC really synonyms?
>
> DC and IC assembling was supposed to be distinguished by different
> cache_maintenance parameters. I create two enums ?icache_maintanence? and
> ?dcache_maintanence? in the revised patch, to make it look better.
>
> +  enum icache_maintenance {IVAU = 0b0101};
> +  enum dcache_maintenance {CVAC = 0b1010, CVAU = 0b1011, CIVAC = 0b1110,
> ZVA = 0b100};
> +  void dc(dcache_maintenance cm, Register Rt) {
> +    sys(0b011, 0b0111, cm, 0b001, Rt);
> +  }
> +
> +  void ic(icache_maintenance cm, Register Rt) {
> +    sys(0b011, 0b0111, cm, 0b001, Rt);
>    }

That looks better, yes.

> 5.       To avoid scratching a new register, I write a small piece of code
> after the dc zva loop in block_zero, so that block_zero doesn?t need to
> fall through to fill_words to zero the small part of array. This code might
> not perform as good as fill_words (unrolled), but it requires one less
> register, and the code size becomes smaller as well.
> The final code is like this:
> 
>   0x0000007f7d3dd4fc: cmp       x11, #0x20
>   0x0000007f7d3dd500: b.lt      0x0000007f7d3dd538
>   0x0000007f7d3dd504: neg       x8, x10
>   0x0000007f7d3dd508: and       x8, x8, #0x3f
>   0x0000007f7d3dd50c: cbz       x8, 0x0000007f7d3dd520
>   0x0000007f7d3dd510: sub       x11, x11, x8, asr #3
>   0x0000007f7d3dd514: sub       x8, x8, #0x8
>   0x0000007f7d3dd518: str       xzr, [x10],#8
>   0x0000007f7d3dd51c: cbnz      x8, 0x0000007f7d3dd514
>   0x0000007f7d3dd520: sub       x11, x11, #0x8
>   0x0000007f7d3dd524: dc        zva, x10
>   0x0000007f7d3dd528: subs      x11, x11, #0x8
>   0x0000007f7d3dd52c: add       x10, x10, #0x40
>   0x0000007f7d3dd530: b.ge      0x0000007f7d3dd524
>   0x0000007f7d3dd534: add       x11, x11, #0x8
>   0x0000007f7d3dd538: tbz       w11, #0, 0x0000007f7d3dd544
>   0x0000007f7d3dd53c: str       xzr, [x10],#8
>   0x0000007f7d3dd540: sub       x11, x11, #0x1
>   0x0000007f7d3dd544: cbz       x11, 0x0000007f7d3dd554
>   0x0000007f7d3dd548: sub       x11, x11, #0x2
>   0x0000007f7d3dd54c: stp       xzr, xzr, [x10],#16
>   0x0000007f7d3dd550: cbnz      x11, 0x0000007f7d3dd548
>
> Would this be fine?

It might well be.  I'd like Ed to do a few measurements of large and
small block zeroing.  My guess is that a reasonably small unrolled loop
doing  STP ZR, ZR  will work better than anything else, but we'll see.

Thanks,

Andrew.

From vladimir.kozlov at oracle.com  Tue Apr 19 16:06:36 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 19 Apr 2016 09:06:36 -0700
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <5716261F.1070205@oracle.com>
References: <5716261F.1070205@oracle.com>
Message-ID: <5716578C.5080902@oracle.com>

Very good. Go with basic. We can do SPU special improvements later if 
needed.

"I also noticed that we don't use prefetching for arraycopy and other 
intrinsics on SPARC."
We do have arraycopy code for it but by default we don't use it:
   product(uintx,  ArraycopySrcPrefetchDistance, 0,
   product(uintx,  ArraycopyDstPrefetchDistance, 0,

Experiments back then did not show improvement on JBB benchmarks but 
some workloads may have benefit. that is why we keep the code.

Thanks,
Vladimir

On 4/19/16 5:35 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following enhancement:
> https://bugs.openjdk.java.net/browse/JDK-6941938
>
> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>
> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>
> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>
> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>
> I evaluated the following three versions of the patch.
>
> -- Basic --
> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>
> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>
> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
> Version "small" tries to improve this.
>
> -- Prefetching --
> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>
> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>
> -- Small --
> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>
> The numbers can be found here:
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>
> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>
> What do you think?
>
> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>
> Thanks,
> Tobias
>
> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
> [3] Microbenchmark results for the "basic" implementation
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
> [4] Microbenchmark results for the "prefetching" implementation
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>

From vladimir.kozlov at oracle.com  Tue Apr 19 15:55:19 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 19 Apr 2016 08:55:19 -0700 (PDT)
Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are
	never emitted
In-Reply-To: <57161A23.3050807@redhat.com>
References: <57161A23.3050807@redhat.com>
Message-ID: <571654E7.5020404@oracle.com>

Looks good.

thanks,
Vladimir

On 4/19/16 4:44 AM, Roland Westrelin wrote:
> (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src <<
> -shift) with src an int have some support in the aarch64.ad ad file:
> rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken
> and never match any ideal graph subtree.
>
> http://cr.openjdk.java.net/~roland/8154537/webrev.00/
>
> Roland.
>

From vivek.r.deshpande at intel.com  Tue Apr 19 17:27:42 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Tue, 19 Apr 2016 17:27:42 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <CA+3eh10C6yhjZWDhjczVwb6oNsM0-sT9ErspN5pyYpxuh0YyfA@mail.gmail.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<CA+3eh10C6yhjZWDhjczVwb6oNsM0-sT9ErspN5pyYpxuh0YyfA@mail.gmail.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8118A@ORSMSX106.amr.corp.intel.com>

Hi Volkar

Thanks for your review and comments. I will surely take care of these things you mentioned.
I am using mathfunc() to methods which call SharedRuntime::d(exp, pow, sin, cos, tan, log, log10) as an alternate when DisableIntrinsic is used to not use LIBM intrinsics.

Thanks and regards,
Vivek


-----Original Message-----
From: Volker Simonis [mailto:volker.simonis at gmail.com] 
Sent: Tuesday, April 19, 2016 1:47 AM
To: Deshpande, Vivek R
Cc: hotspot compiler; Vladimir Kozlov; Christian Thalinger
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Vivek,

you introduce the new method TemplateInterpreterGenerator::mathfunc()
but only implement it on x86_64. Shouldn't we have at least empty implementations of this method for all architectures?

Also the description in the bug sounds quite general but you only seem to implement it for certain math-intrinsics on x64.

Another minor nit: in vmSymbols.hpp I don't think we need the const qualifier on the ID argument because it is only an enum anyway:

+  static bool is_disabled_by_flags(const vmIntrinsics::ID id);

It makes sense on:

static bool is_disabled_by_flags(const methodHandle& method);

because here we are passing method by reference and the const qualifier guaranties that is_disabled_by_flags will not change the method.

Regards,
Volker


On Mon, Apr 18, 2016 at 7:38 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com> wrote:
> Hi all
>
>
>
> I would like to contribute a patch which helps to control the 
> intrinsics in interpreter, c1 and c2 by disabling the stub generation.
>
> This uses -XX:DisableIntrinsic option to achieve the same.
>
> Could you please review and sponsor this patch.
>
>
>
> Bug-id:
>
> https://bugs.openjdk.java.net/browse/JDK-8154473
> webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webr
> ev.00/
>
>
>
> Thanks and regards,
>
> Vivek
>
>

From nils.eliasson at oracle.com  Tue Apr 19 17:13:12 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 19 Apr 2016 10:13:12 -0700 (PDT)
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <5714B5CB.70705@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
	<CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>
	<5710E772.5050801@oracle.com>
	<B3A4C7A2-E9B6-403A-BF0A-8B64049F4338@oracle.com>
	<5714B5CB.70705@oracle.com>
Message-ID: <57166728.4060906@oracle.com>


On 2016-04-18 12:24, Nils Eliasson wrote:
> Hi,
>
> On 2016-04-15 22:43, Christian Thalinger wrote:
>>
>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson 
>>> <nils.eliasson at oracle.com> wrote:
>>>
>>> Hi,
>>>
>>> On 2016-04-14 20:45, Christian Thalinger wrote:
>>>>
>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson 
>>>>> <nils.eliasson at oracle.com> wrote:
>>>>>
>>>>> I moved the reasons to CompileTask.hpp and put it together with 
>>>>> the names list. Also changed the type from int to CompileReason as 
>>>>> Igor suggested.
>>>>>
>>>>> It gets verbose in the method declarations in compileBroker
>>>>
>>>> Don?t worry about this.
>>>>
>>>>> and sometimes I think CompileReason should be declared in 
>>>>> CompileBroker because it is mostly used there. On the other hand, 
>>>>> CompileTask is the keeper of the CompileReason so it makes sense too.
>>>>
>>>> Yes, that?s the right place.
>>>>
>>>>>
>>>>> New webrev:
>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ 
>>>>> <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.03/>
>>>>
>>>> *+ bool can_become_stale() const {*
>>>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);*
>>>> *+ }*
>>>> I?m not a fan of implicit contracts just defined by comments.  This 
>>>> method doesn?t seem to be performance critical so I would suggest 
>>>> to use a switch-case.  An attribute on the enum would be much 
>>>> better but we all know this isn?t Java.
>>>
>>> As you suggested:
>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04
>>
>> Thanks.  A space is missing and the closing } indent is wrong:
>> *+ bool can_become_stale() const {*
>> *+ switch(_compile_reason) {*
>> *+ case Reason_BackedgeCount:*
>> *+ case Reason_InvocationCount:*
>> *+ case Reason_Tiered:*
>> *+ return !_is_blocking;*
>> *+ }*
>> *+ return false;*
>> *+ }*
And I fixed the indentation.

Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/

Thanks!
Nils
>> Also, what about:
>> *+ Reason_None,*
>> *+ Reason_CTW, // Compile the world*
>> *+ Reason_Replay, // ciReplay*
>> These were covered before.
> Reason_None - is only used for bounds checking together with Reason_Count.
> Reason_Replay - if these compilations can get stale we can get 
> indeterminism in replay.
> Reason_CTW - CTW could silently drop compiles -> more indeterminism.
>
> Regards,
> Nils
>
>>
>>>
>>> Also made reasons CTW and Replay not stale-able.
>>>
>>> Thanks!
>>> Nils
>>>
>>>>
>>>>>
>>>>> Thanks!
>>>>> Nils
>>>>>
>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>>>>>> Very nice, I like it.
>>>>>>
>>>>>> One note. CompileReason (and its names) should be CompileTask 
>>>>>> class where it is recorded. Then CompileTask::can_become_stale() 
>>>>>> can be in header file so it is inlinined on all platforms.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> New webrev:
>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ 
>>>>>>> <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.02/>
>>>>>>>
>>>>>>> Summary
>>>>>>> Introduced an enum CompileReason with members matching all the old
>>>>>>> variants, and a table containing all the unchanged strings. I 
>>>>>>> see the
>>>>>>> possibility of removing/changing/simplifying some CompileReasons but
>>>>>>> have choosen not to do so in this change.
>>>>>>>
>>>>>>> Only new logic is the CompileTask::can_become_stale() method.
>>>>>>>
>>>>>>> Testing:
>>>>>>> Running Testset hotspot on all platforms and hotspot_all on one 
>>>>>>> platform
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nils Eliawsson
>>>>>>>
>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>>>>>> Tasks get evicted from the compile_queue if their invocation 
>>>>>>>>> counter
>>>>>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>>>>>>
>>>>>>>>> I'll do a proper fix, it is the right thing to do and should 
>>>>>>>>> be pretty
>>>>>>>>> quick. I'll change the comment to an enum that represent who 
>>>>>>>>> submitted
>>>>>>>>> the compile, and add a table for the comments. This could be 
>>>>>>>>> useful in
>>>>>>>>> other settings to.
>>>>>>>>
>>>>>>>> Sounds good.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Nils
>>>>>>>>>
>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>>>>>> What do you mean "stale"?
>>>>>>>>>> I would prefer to see the real fix as you suggested to avoid 
>>>>>>>>>> removing
>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>>>>>>
>>>>>>>>>>> Summary:
>>>>>>>>>>> Add method enqueued for compilation with WB API may be 
>>>>>>>>>>> removed from
>>>>>>>>>>> the compile queue as stale.
>>>>>>>>>>>
>>>>>>>>>>> Solution:
>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>>>>>> stale while the test is running. (Also added some extra
>>>>>>>>>>> checks that may spare us from waiting until timeout for 
>>>>>>>>>>> failing.)
>>>>>>>>>>>
>>>>>>>>>>> This is an workaround but we should consider fixing something
>>>>>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>>>>>> task with info about the origin of the compile. The comment 
>>>>>>>>>>> field has
>>>>>>>>>>> this information - but then it needs to be
>>>>>>>>>>> converted to an enum.
>>>>>>>>>>>
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Nils Eliasson
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160419/5d07c5be/attachment-0001.html>

From christian.thalinger at oracle.com  Tue Apr 19 17:37:09 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 19 Apr 2016 07:37:09 -1000
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <57166728.4060906@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
	<CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>
	<5710E772.5050801@oracle.com>
	<B3A4C7A2-E9B6-403A-BF0A-8B64049F4338@oracle.com>
	<5714B5CB.70705@oracle.com> <57166728.4060906@oracle.com>
Message-ID: <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com>


> On Apr 19, 2016, at 7:13 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> 
> 
> On 2016-04-18 12:24, Nils Eliasson wrote:
>> Hi,
>> 
>> On 2016-04-15 22:43, Christian Thalinger wrote:
>>> 
>>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson <nils.eliasson at oracle.com <mailto:nils.eliasson at oracle.com>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> On 2016-04-14 20:45, Christian Thalinger wrote:
>>>>> 
>>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson <nils.eliasson at oracle.com <mailto:nils.eliasson at oracle.com>> wrote:
>>>>>> 
>>>>>> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested.
>>>>>> 
>>>>>> It gets verbose in the method declarations in compileBroker
>>>>> 
>>>>> Don?t worry about this.
>>>>> 
>>>>>> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too.
>>>>> 
>>>>> Yes, that?s the right place.
>>>>> 
>>>>>> 
>>>>>> New webrev:
>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.03/>
>>>>> 
>>>>> +   bool         can_become_stale() const          {
>>>>> +     return !_is_blocking && (_compile_reason < Reason_Whitebox);
>>>>> +   }
>>>>> I?m not a fan of implicit contracts just defined by comments.  This method doesn?t seem to be performance critical so I would suggest to use a switch-case.  An attribute on the enum would be much better but we all know this isn?t Java.
>>>> 
>>>> As you suggested:
>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.04>
>>> 
>>> Thanks.  A space is missing and the closing } indent is wrong:
>>> +   bool         can_become_stale() const          {
>>> +     switch(_compile_reason) {
>>> +       case Reason_BackedgeCount:
>>> +       case Reason_InvocationCount:
>>> +       case Reason_Tiered:
>>> +         return !_is_blocking;
>>> +       }
>>> +     return false;
>>> +   }
> And I fixed the indentation.
> 
> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ <http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/>

+     switch(_compile_reason) {
Space after switch.

> 
> Thanks!
> Nils
>>> Also, what about:
>>> +       Reason_None,
>>> +       Reason_CTW,              // Compile the world
>>> +       Reason_Replay,           // ciReplay
>>> These were covered before.
>> Reason_None - is only used for bounds checking together with Reason_Count.
>> Reason_Replay - if these compilations can get stale we can get indeterminism in replay.
>> Reason_CTW - CTW could silently drop compiles -> more indeterminism.
>> 
>> Regards,
>> Nils
>> 
>>> 
>>>> 
>>>> Also made reasons CTW and Replay not stale-able. 
>>>> 
>>>> Thanks!
>>>> Nils
>>>> 
>>>>> 
>>>>>> 
>>>>>> Thanks!
>>>>>> Nils
>>>>>> 
>>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>>>>>>> Very nice, I like it.
>>>>>>> 
>>>>>>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> New webrev:
>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.02/>
>>>>>>>> 
>>>>>>>> Summary
>>>>>>>> Introduced an enum CompileReason with members matching all the old
>>>>>>>> variants, and a table containing all the unchanged strings. I see the
>>>>>>>> possibility of removing/changing/simplifying some CompileReasons but
>>>>>>>> have choosen not to do so in this change.
>>>>>>>> 
>>>>>>>> Only new logic is the CompileTask::can_become_stale() method.
>>>>>>>> 
>>>>>>>> Testing:
>>>>>>>> Running Testset hotspot on all platforms and hotspot_all on one platform
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Nils Eliawsson
>>>>>>>> 
>>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>>>>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>>>>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>>>>>>> 
>>>>>>>>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>>>>>>>>> quick. I'll change the comment to an enum that represent who submitted
>>>>>>>>>> the compile, and add a table for the comments. This could be useful in
>>>>>>>>>> other settings to.
>>>>>>>>> 
>>>>>>>>> Sounds good.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Nils
>>>>>>>>>> 
>>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>>>>>>> What do you mean "stale"?
>>>>>>>>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>> 
>>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>>>>>>> 
>>>>>>>>>>>> Summary:
>>>>>>>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>>>>>>>> the compile queue as stale.
>>>>>>>>>>>> 
>>>>>>>>>>>> Solution:
>>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>>>>>>> stale while the test is running. (Also added some extra
>>>>>>>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>>>>>>> 
>>>>>>>>>>>> This is an workaround but we should consider fixing something
>>>>>>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>>>>>>> task with info about the origin of the compile. The comment field has
>>>>>>>>>>>> this information - but then it needs to be
>>>>>>>>>>>> converted to an enum.
>>>>>>>>>>>> 
>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 <https://bugs.openjdk.java.net/browse/JDK-8153013>
>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.01/>
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Nils Eliasson
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160419/8e8d3d6f/attachment.html>

From tobias.hartmann at oracle.com  Tue Apr 19 17:13:05 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 19 Apr 2016 10:13:05 -0700 (PDT)
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <5716578C.5080902@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
Message-ID: <57166721.5010208@oracle.com>

Thanks, Vladimir!

On 19.04.2016 18:06, Vladimir Kozlov wrote:
> Very good. Go with basic. We can do SPU special improvements later if needed.

Okay, I'll push the basic version.

For reference, here are the results on a SPARC T4:
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png

> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
> We do have arraycopy code for it but by default we don't use it:
>   product(uintx,  ArraycopySrcPrefetchDistance, 0,
>   product(uintx,  ArraycopyDstPrefetchDistance, 0,
> 
> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.

Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:

java -XX:ArraycopySrcPrefetchDistance=42 -version
ArraycopySrcPrefetchDistance (42) must be 0
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

Thanks,
Tobias

> 
> Thanks,
> Vladimir
> 
> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following enhancement:
>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>
>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>
>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>
>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>
>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>
>> I evaluated the following three versions of the patch.
>>
>> -- Basic --
>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>
>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>
>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>> Version "small" tries to improve this.
>>
>> -- Prefetching --
>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>
>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>
>> -- Small --
>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>
>> The numbers can be found here:
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>
>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>
>> What do you think?
>>
>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>
>> Thanks,
>> Tobias
>>
>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>> [3] Microbenchmark results for the "basic" implementation
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>> [4] Microbenchmark results for the "prefetching" implementation
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>

From rwestrel at redhat.com  Tue Apr 19 18:54:04 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 19 Apr 2016 20:54:04 +0200
Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are
	never emitted
In-Reply-To: <571654E7.5020404@oracle.com>
References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com>
Message-ID: <57167ECC.8080304@redhat.com>

Thanks everyone for the review. I need a sponsor for that one given it
touches a shared code test case.

Roland.

From vladimir.kozlov at oracle.com  Tue Apr 19 22:12:21 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 19 Apr 2016 15:12:21 -0700 (PDT)
Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are
	never emitted
In-Reply-To: <57167ECC.8080304@redhat.com>
References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com>
	<57167ECC.8080304@redhat.com>
Message-ID: <5716AD45.1060302@oracle.com>

In JPRT.

Vladimir

On 4/19/16 11:54 AM, Roland Westrelin wrote:
> Thanks everyone for the review. I need a sponsor for that one given it
> touches a shared code test case.
>
> Roland.
>

From vivek.r.deshpande at intel.com  Wed Apr 20 00:44:43 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Wed, 20 Apr 2016 00:44:43 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <57162A88.7030608@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<57162A88.7030608@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com>

HI Nils

Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static.
Could you point me the right location for the function ?
Also I have updated the webrev with rest of the comments here:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/

Regards,
Vivek

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson
Sent: Tuesday, April 19, 2016 5:55 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Vivek,

The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives.

Regards,
Nils
On 2016-04-18 19:38, Deshpande, Vivek R wrote:
Hi all

I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
This uses -XX:DisableIntrinsic option to achieve the same.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154473
webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/

Thanks and regards,
Vivek


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160420/52e2ada9/attachment.html>

From john.r.rose at oracle.com  Wed Apr 20 01:46:21 2016
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 19 Apr 2016 18:46:21 -0700
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <57166721.5010208@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
Message-ID: <B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>

On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Okay, I'll push the basic version.
> 


So I started looking at your code and my inner SPARC junkie took over.

This is what happened:
  http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ <http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/>

Perhaps there are some ideas that might be helpful:
- The rampdown logic can lose a couple of instructions by using xorcc and movr.
- It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
- Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?

On the other hand, what you wrote is nice and simple.

HTH
? John

P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
versions of misalignment, still with vectorization, as with the arraycopy stubs.
But that's neither nice nor simple.

> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Thanks, Vladimir!
> 
> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>> Very good. Go with basic. We can do SPU special improvements later if needed.
> 
> Okay, I'll push the basic version.
> 
> For reference, here are the results on a SPARC T4:
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
> 
>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>> We do have arraycopy code for it but by default we don't use it:
>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>> 
>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
> 
> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
> 
> java -XX:ArraycopySrcPrefetchDistance=42 -version
> ArraycopySrcPrefetchDistance (42) must be 0
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit
> 
> Thanks,
> Tobias
> 
>> 
>> Thanks,
>> Vladimir
>> 
>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>> Hi,
>>> 
>>> please review the following enhancement:
>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>> 
>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>> 
>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>> 
>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>> 
>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>> 
>>> I evaluated the following three versions of the patch.
>>> 
>>> -- Basic --
>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>> 
>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>> 
>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>> Version "small" tries to improve this.
>>> 
>>> -- Prefetching --
>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>> 
>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>> 
>>> -- Small --
>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>> 
>>> The numbers can be found here:
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>> 
>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>> 
>>> What do you think?
>>> 
>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>> 
>>> Thanks,
>>> Tobias
>>> 
>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>> [3] Microbenchmark results for the "basic" implementation
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>> [4] Microbenchmark results for the "prefetching" implementation
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160419/7469e546/attachment-0001.html>

From long.chen at linaro.org  Tue Apr 19 12:54:55 2016
From: long.chen at linaro.org (Long Chen)
Date: Tue, 19 Apr 2016 20:54:55 +0800
Subject: aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <5714D930.4090804@redhat.com>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<5714D930.4090804@redhat.com>
Message-ID: <CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>

Thanks for all these nice comments. Here is a revised version:

http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.v02.patch


Changes:

1.       Are DC and IC really synonyms?


DC and IC assembling was supposed to be distinguished by different
cache_maintenance parameters. I create two enums ?icache_maintanence? and
?dcache_maintanence? in the revised patch, to make it look better.


+  enum icache_maintenance {IVAU = 0b0101};


+  enum dcache_maintenance {CVAC = 0b1010, CVAU = 0b1011, CIVAC = 0b1110,
ZVA = 0b100};


+  void dc(dcache_maintenance cm, Register Rt) {


+    sys(0b011, 0b0111, cm, 0b001, Rt);


+  }


+


+  void ic(icache_maintenance cm, Register Rt) {


+    sys(0b011, 0b0111, cm, 0b001, Rt);


   }


2.       I'm not convinced of the value of this.  We already know that a
simple


  while (count-- > 0) {


    *to++ = v;


  }


turns into a call to memset() which does DC ZVA.


OK. I reverted this change and leave it to the compiler. The patch becomes
simpler :)


3.       Block_zeroing -> block_zero, 8-byte unit -> HeapWords


4.       I don't think this CBZ does anything useful: 0x0000007fa880f630:
cbz   x8, 0x0000007fa880f670


Removed


5.       To avoid scratching a new register, I write a small piece of code
after the dc zva loop in block_zero, so that block_zero doesn?t need to
fall through to fill_words to zero the small part of array. This code might
not perform as good as fill_words (unrolled), but it requires one less
register, and the code size becomes smaller as well.


The final code is like this:

  0x0000007f7d3dd4fc: cmp       x11, #0x20

  0x0000007f7d3dd500: b.lt      0x0000007f7d3dd538

  0x0000007f7d3dd504: neg       x8, x10

  0x0000007f7d3dd508: and       x8, x8, #0x3f

  0x0000007f7d3dd50c: cbz       x8, 0x0000007f7d3dd520

  0x0000007f7d3dd510: sub       x11, x11, x8, asr #3

  0x0000007f7d3dd514: sub       x8, x8, #0x8

  0x0000007f7d3dd518: str       xzr, [x10],#8

  0x0000007f7d3dd51c: cbnz      x8, 0x0000007f7d3dd514

  0x0000007f7d3dd520: sub       x11, x11, #0x8

  0x0000007f7d3dd524: dc        zva, x10

  0x0000007f7d3dd528: subs      x11, x11, #0x8

  0x0000007f7d3dd52c: add       x10, x10, #0x40

  0x0000007f7d3dd530: b.ge      0x0000007f7d3dd524

  0x0000007f7d3dd534: add       x11, x11, #0x8

  0x0000007f7d3dd538: tbz       w11, #0, 0x0000007f7d3dd544

  0x0000007f7d3dd53c: str       xzr, [x10],#8

  0x0000007f7d3dd540: sub       x11, x11, #0x1

  0x0000007f7d3dd544: cbz       x11, 0x0000007f7d3dd554

  0x0000007f7d3dd548: sub       x11, x11, #0x2

  0x0000007f7d3dd54c: stp       xzr, xzr, [x10],#16

  0x0000007f7d3dd550: cbnz      x11, 0x0000007f7d3dd548


Would this be fine?


Regards

Long

On 18 April 2016 at 20:55, Andrew Haley <aph at redhat.com> wrote:

> One other thing.  This is rather a lot of code to emit every time an
> array is created:
>
>  ;; zero_words {
>   0x0000007fa880f5f0: cmp       x11, #0x20
>   0x0000007fa880f5f4: b.lt      0x0000007fa880f62c
>
>   0x0000007fa880f5f8: neg       x8, x10
>   0x0000007fa880f5fc: and       x8, x8, #0x7f
>   0x0000007fa880f600: cbz       x8, 0x0000007fa880f614
>   0x0000007fa880f604: sub       x11, x11, x8, asr #3
>   0x0000007fa880f608: sub       x8, x8, #0x8
>   0x0000007fa880f60c: str       xzr, [x10],#8
>   0x0000007fa880f610: cbnz      x8, 0x0000007fa880f608
>   0x0000007fa880f614: sub       x11, x11, #0x10
>   0x0000007fa880f618: dc        zva, x10
>   0x0000007fa880f61c: subs      x11, x11, #0x10
>   0x0000007fa880f620: add       x10, x10, #0x80
>   0x0000007fa880f624: b.ge      0x0000007fa880f618
>   0x0000007fa880f628: add       x11, x11, #0x10
>
>   0x0000007fa880f62c: and       x8, x11, #0x7
>
> I don't think this CBZ does anything useful:
>
>   0x0000007fa880f630: cbz       x8, 0x0000007fa880f670
>
> (I'm assuming that the 0-7 cases are uniformly distributed.)
>
>   0x0000007fa880f634: sub       x11, x11, x8
>   0x0000007fa880f638: add       x10, x10, x8, lsl #3
>   0x0000007fa880f63c: adr       x9, 0x0000007fa880f670
>   0x0000007fa880f640: sub       x9, x9, x8, lsl #2
>   0x0000007fa880f644: br        x9
>   0x0000007fa880f648: add       x10, x10, #0x40
>   0x0000007fa880f64c: sub       x11, x11, #0x8
>   0x0000007fa880f650: stur      xzr, [x10,#-64]
>   0x0000007fa880f654: stur      xzr, [x10,#-56]
>   0x0000007fa880f658: stur      xzr, [x10,#-48]
>   0x0000007fa880f65c: stur      xzr, [x10,#-40]
>   0x0000007fa880f660: stur      xzr, [x10,#-32]
>   0x0000007fa880f664: stur      xzr, [x10,#-24]
>   0x0000007fa880f668: stur      xzr, [x10,#-16]
>   0x0000007fa880f66c: stur      xzr, [x10,#-8]
>   0x0000007fa880f670: cbnz      x11, 0x0000007fa880f648
>  ;; } zero_words
>
> We could think about moving the large block case into a stub which is
> emitted after the main body of the method, or even into a shared stub.
> A shared stub would require the args to be in fixed registers, though.
>
> Andrew.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160419/49e1c307/attachment.html>

From rwestrel at redhat.com  Wed Apr 20 06:30:53 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 20 Apr 2016 08:30:53 +0200
Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are
	never emitted
In-Reply-To: <5716AD45.1060302@oracle.com>
References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com>
	<57167ECC.8080304@redhat.com> <5716AD45.1060302@oracle.com>
Message-ID: <5717221D.2010107@redhat.com>

> In JPRT.

Thanks!

Roland.

From nils.eliasson at oracle.com  Wed Apr 20 07:46:17 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 20 Apr 2016 09:46:17 +0200
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
	<CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>
	<5710E772.5050801@oracle.com>
	<B3A4C7A2-E9B6-403A-BF0A-8B64049F4338@oracle.com>
	<5714B5CB.70705@oracle.com> <57166728.4060906@oracle.com>
	<31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com>
Message-ID: <571733C9.5090302@oracle.com>


On 2016-04-19 19:37, Christian Thalinger wrote:
>
>> On Apr 19, 2016, at 7:13 AM, Nils Eliasson <nils.eliasson at oracle.com 
>> <mailto:nils.eliasson at oracle.com>> wrote:
>>
>>
>>
>> On 2016-04-18 12:24, Nils Eliasson wrote:
>>> Hi,
>>>
>>> On 2016-04-15 22:43, Christian Thalinger wrote:
>>>>
>>>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson 
>>>>> <nils.eliasson at oracle.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 2016-04-14 20:45, Christian Thalinger wrote:
>>>>>>
>>>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson 
>>>>>>> <nils.eliasson at oracle.com> wrote:
>>>>>>>
>>>>>>> I moved the reasons to CompileTask.hpp and put it together with 
>>>>>>> the names list. Also changed the type from int to CompileReason 
>>>>>>> as Igor suggested.
>>>>>>>
>>>>>>> It gets verbose in the method declarations in compileBroker
>>>>>>
>>>>>> Don?t worry about this.
>>>>>>
>>>>>>> and sometimes I think CompileReason should be declared in 
>>>>>>> CompileBroker because it is mostly used there. On the other 
>>>>>>> hand, CompileTask is the keeper of the CompileReason so it makes 
>>>>>>> sense too.
>>>>>>
>>>>>> Yes, that?s the right place.
>>>>>>
>>>>>>>
>>>>>>> New webrev:
>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ 
>>>>>>> <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.03/>
>>>>>>
>>>>>> *+ bool can_become_stale() const {*
>>>>>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);*
>>>>>> *+ }*
>>>>>> I?m not a fan of implicit contracts just defined by comments. 
>>>>>>  This method doesn?t seem to be performance critical so I would 
>>>>>> suggest to use a switch-case.  An attribute on the enum would be 
>>>>>> much better but we all know this isn?t Java.
>>>>>
>>>>> As you suggested:
>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04
>>>>
>>>> Thanks.  A space is missing and the closing } indent is wrong:
>>>> *+ bool can_become_stale() const {*
>>>> *+ switch(_compile_reason) {*
>>>> *+ case Reason_BackedgeCount:*
>>>> *+ case Reason_InvocationCount:*
>>>> *+ case Reason_Tiered:*
>>>> *+ return !_is_blocking;*
>>>> *+ }*
>>>> *+ return false;*
>>>> *+ }*
>> And I fixed the indentation.
>>
>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/
>
> *+ switch(_compile_reason) {*
> Space after switch.

New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.06/

Thanks,
Nils

>
>>
>> Thanks!
>> Nils
>>>> Also, what about:
>>>> *+ Reason_None,*
>>>> *+ Reason_CTW, // Compile the world*
>>>> *+ Reason_Replay, // ciReplay*
>>>> These were covered before.
>>> Reason_None - is only used for bounds checking together with 
>>> Reason_Count.
>>> Reason_Replay - if these compilations can get stale we can get 
>>> indeterminism in replay.
>>> Reason_CTW - CTW could silently drop compiles -> more indeterminism.
>>>
>>> Regards,
>>> Nils
>>>
>>>>
>>>>>
>>>>> Also made reasons CTW and Replay not stale-able.
>>>>>
>>>>> Thanks!
>>>>> Nils
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Nils
>>>>>>>
>>>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>>>>>>>> Very nice, I like it.
>>>>>>>>
>>>>>>>> One note. CompileReason (and its names) should be CompileTask 
>>>>>>>> class where it is recorded. Then 
>>>>>>>> CompileTask::can_become_stale() can be in header file so it is 
>>>>>>>> inlinined on all platforms.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> New webrev:
>>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ 
>>>>>>>>> <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.02/>
>>>>>>>>>
>>>>>>>>> Summary
>>>>>>>>> Introduced an enum CompileReason with members matching all the old
>>>>>>>>> variants, and a table containing all the unchanged strings. I 
>>>>>>>>> see the
>>>>>>>>> possibility of removing/changing/simplifying some 
>>>>>>>>> CompileReasons but
>>>>>>>>> have choosen not to do so in this change.
>>>>>>>>>
>>>>>>>>> Only new logic is the CompileTask::can_become_stale() method.
>>>>>>>>>
>>>>>>>>> Testing:
>>>>>>>>> Running Testset hotspot on all platforms and hotspot_all on 
>>>>>>>>> one platform
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Nils Eliawsson
>>>>>>>>>
>>>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>>>>>>>> Tasks get evicted from the compile_queue if their invocation 
>>>>>>>>>>> counter
>>>>>>>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>>>>>>>>
>>>>>>>>>>> I'll do a proper fix, it is the right thing to do and should 
>>>>>>>>>>> be pretty
>>>>>>>>>>> quick. I'll change the comment to an enum that represent who 
>>>>>>>>>>> submitted
>>>>>>>>>>> the compile, and add a table for the comments. This could be 
>>>>>>>>>>> useful in
>>>>>>>>>>> other settings to.
>>>>>>>>>>
>>>>>>>>>> Sounds good.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Nils
>>>>>>>>>>>
>>>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>>>>>>>> What do you mean "stale"?
>>>>>>>>>>>> I would prefer to see the real fix as you suggested to 
>>>>>>>>>>>> avoid removing
>>>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>> Add method enqueued for compilation with WB API may be 
>>>>>>>>>>>>> removed from
>>>>>>>>>>>>> the compile queue as stale.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Solution:
>>>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure 
>>>>>>>>>>>>> nothing gets
>>>>>>>>>>>>> stale while the test is running. (Also added some extra
>>>>>>>>>>>>> checks that may spare us from waiting until timeout for 
>>>>>>>>>>>>> failing.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is an workaround but we should consider fixing something
>>>>>>>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>>>>>>>> task with info about the origin of the compile. The 
>>>>>>>>>>>>> comment field has
>>>>>>>>>>>>> this information - but then it needs to be
>>>>>>>>>>>>> converted to an enum.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013
>>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Nils Eliasson
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160420/6b14238b/attachment-0001.html>

From vladimir.kempik at oracle.com  Tue Apr 19 16:38:33 2016
From: vladimir.kempik at oracle.com (Vladimir Kempik)
Date: Tue, 19 Apr 2016 09:38:33 -0700 (PDT)
Subject: [8u] RFR 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <570B9803.2030509@oracle.com>
References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com>
Message-ID: <57165F09.5050404@oracle.com>

Hello

Can I get some jdk8u reviewer to take a look at it as well?

Thanks, Vladimir.

On 11.04.2016 15:26, Tobias Hartmann wrote:
> Hi Vladimir,
>
> On 11.04.2016 14:00, Vladimir Kempik wrote:
>> Hello
>>
>> Please review this backport of 8130309 to jdk8u.
>>
>> Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope.
>>
>> Testing: jprt, failing test.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309
>> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/
> Looks good to me. Thanks for backporting this!
>
> Best regards,
> Tobias
>
>> Thanks
>> -Vladimir
>>


From tobias.hartmann at oracle.com  Wed Apr 20 08:05:47 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 20 Apr 2016 10:05:47 +0200
Subject: [8u] RFR 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <57165F09.5050404@oracle.com>
References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com>
	<57165F09.5050404@oracle.com>
Message-ID: <5717385B.7040505@oracle.com>

Hi Vladimir,

I think this should go to jdk8u-dev (CC'ed) as well.

Best regards,
Tobias

On 19.04.2016 18:38, Vladimir Kempik wrote:
> Hello
> 
> Can I get some jdk8u reviewer to take a look at it as well?
> 
> Thanks, Vladimir.
> 
> On 11.04.2016 15:26, Tobias Hartmann wrote:
>> Hi Vladimir,
>>
>> On 11.04.2016 14:00, Vladimir Kempik wrote:
>>> Hello
>>>
>>> Please review this backport of 8130309 to jdk8u.
>>>
>>> Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope.
>>>
>>> Testing: jprt, failing test.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309
>>> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/
>> Looks good to me. Thanks for backporting this!
>>
>> Best regards,
>> Tobias
>>
>>> Thanks
>>> -Vladimir
>>>
> 

From jan.civlin at intel.com  Wed Apr 20 10:11:52 2016
From: jan.civlin at intel.com (Civlin, Jan)
Date: Wed, 20 Apr 2016 10:11:52 +0000
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <57157726.4030701@oracle.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
Message-ID: <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>

Vladimir,

Please look at the updated patch at
http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/

I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().

The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.

The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.

Thank you,

J

[jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
provider = SUN
algorithm = SHA-256
msgSize = 1024 bytes
offset = 0
iters = 10000000
warmupIters = 20000
hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
TestSHA runtime = 28.756324129 seconds
TestSHA throughput = 356.09558280340946 MB/s

[jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
provider = SUN
algorithm = SHA-256
msgSize = 1024 bytes
offset = 0
iters = 10000000
warmupIters = 20000
hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
TestSHA runtime = 28.912701124 seconds
TestSHA throughput = 354.1696071938408 MB/s

[jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
provider = SUN
algorithm = SHA-256
msgSize = 1024 bytes
offset = 0
iters = 10000000
warmupIters = 20000
hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
TestSHA runtime = 29.339789962 seconds
TestSHA throughput = 349.01408678325697 MB/s


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Monday, April 18, 2016 5:09 PM
To: Civlin, Jan; hotspot compiler
Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

Hi Jan,

The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.

I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.

Please, move new code in macroAssembler_x86_sha.cpp to the end of file.

_k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:

StubRoutines::x86::_k256_W_adr = generate_k256_W();

What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.

Thanks,
Vladimir

On 4/18/16 2:44 PM, Civlin, Jan wrote:
> == Correction in the subject line ===
>
> We would like to contribute the SHA256 AVX2 intrinsic.
>
> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>
> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>
> Contributor: Jan Civlin.
>
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>

From tobias.hartmann at oracle.com  Wed Apr 20 13:31:51 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 20 Apr 2016 15:31:51 +0200
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
Message-ID: <571784C7.6020304@oracle.com>

Hi John,

On 20.04.2016 03:46, John Rose wrote:
> So I started looking at your code and my inner SPARC junkie took over.
> 
> This is what happened:
>   http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/

Thanks a lot for having a look!

> Perhaps there are some ideas that might be helpful:
> - The rampdown logic can lose a couple of instructions by using xorcc and movr.

Right, this simplifies the code a bit:
http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/

I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?

> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?

I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
http://cr.openjdk.java.net/~thartmann/6941938/webrev.01

Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].

> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).

Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.

What do you think?

Thanks,
Tobias

[1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
[2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
[3] Runtime alignment checks:

bind(Lunaligned);
Label next;
xor3(ary1, ary2, tmp);
and3(tmp, 7, tmp);
br_null_short(tmp, Assembler::pn, next);
STOP("One array is unaligned!");
should_not_reach_here();
bind(next);
STOP("Both arrays are unaligned!");
 
> On the other hand, what you wrote is nice and simple.
> 
> HTH
> ? John
> 
> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
> versions of misalignment, still with vectorization, as with the arraycopy stubs.
> But that's neither nice nor simple.
> 
>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>
>> Thanks, Vladimir!
>>
>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>
>> Okay, I'll push the basic version.
>>
>> For reference, here are the results on a SPARC T4:
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>
>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>> We do have arraycopy code for it but by default we don't use it:
>>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>
>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>
>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>
>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>> ArraycopySrcPrefetchDistance (42) must be 0
>> Error: Could not create the Java Virtual Machine.
>> Error: A fatal exception has occurred. Program will exit
>>
>> Thanks,
>> Tobias
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> please review the following enhancement:
>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>
>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>
>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>
>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>
>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>
>>>> I evaluated the following three versions of the patch.
>>>>
>>>> -- Basic --
>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>
>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>
>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>> Version "small" tries to improve this.
>>>>
>>>> -- Prefetching --
>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>
>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>
>>>> -- Small --
>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>
>>>> The numbers can be found here:
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>
>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>
>>>> What do you think?
>>>>
>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>> [3] Microbenchmark results for the "basic" implementation
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>>>

From tobias.hartmann at oracle.com  Wed Apr 20 13:46:53 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 20 Apr 2016 15:46:53 +0200
Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options
Message-ID: <5717884D.2020108@oracle.com>

Hi,

please review the following patch:

https://bugs.openjdk.java.net/browse/JDK-8086068
http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/

The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line.

The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag.

Tested with regression test and RBT (running).

Thanks,
Tobias

From zoltan.majo at oracle.com  Wed Apr 20 14:02:43 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 20 Apr 2016 16:02:43 +0200
Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler"
	options
In-Reply-To: <5717884D.2020108@oracle.com>
References: <5717884D.2020108@oracle.com>
Message-ID: <57178C03.1010902@oracle.com>

Hi Tobias,


thank you for looking taking care of this issue.

There are some other flags that are set to 'false' with -Xint 
(UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). 
Do you know if re-enabling any of those causes problems?

Otherwise it looks good to me.

Best regards,


Zoltan

On 04/20/2016 03:46 PM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch:
>
> https://bugs.openjdk.java.net/browse/JDK-8086068
> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/
>
> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line.
>
> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag.
>
> Tested with regression test and RBT (running).
>
> Thanks,
> Tobias


From tobias.hartmann at oracle.com  Wed Apr 20 14:22:23 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 20 Apr 2016 16:22:23 +0200
Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler"
	options
In-Reply-To: <57178C03.1010902@oracle.com>
References: <5717884D.2020108@oracle.com> <57178C03.1010902@oracle.com>
Message-ID: <5717909F.5040004@oracle.com>

Hi Zoltan,

On 20.04.2016 16:02, Zolt?n Maj? wrote:
> There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems?

I checked and combining them with -Xint does not cause any problems because they are guarded by UseCompiler.

> Otherwise it looks good to me.

Thanks for the review!

Best regards,
Tobias

> Best regards,
> 
> 
> Zoltan
> 
> On 04/20/2016 03:46 PM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8086068
>> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/
>>
>> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line.
>>
>> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag.
>>
>> Tested with regression test and RBT (running).
>>
>> Thanks,
>> Tobias
> 

From zoltan.majo at oracle.com  Wed Apr 20 15:01:18 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 20 Apr 2016 17:01:18 +0200
Subject: [9] RFR(XS): 8153292:
	AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger
	out-of-heap prefetching
Message-ID: <571799BE.1030203@oracle.com>

Hi,


please review the patch for 8153292.

https://bugs.openjdk.java.net/browse/JDK-8153292


Problem: To avoid out-of-heap accesses by instructions prefetching data, 
TLABs have a reserved area. The size of that area is supposed to be 
large enough to accommodate possible prefetching.

The amount of prefetched data is controlled separately for instance and 
array allocations (by the AllocateInstancePrefetchLines and 
AllocatePrefetchLines flags). The size of the reserved area in the TLAB 
is, however, determined only based on AllocatePrefetchLines. As a 
result, AllocateInstancePrefetchLines > AllocatePrefetchLines can 
trigger out-of-heap memory accesses.


Solution: Set the size of the reserved TLAB area to the MAX of both flags.

Webrev:
http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/

Testing:
- JPRT;
- local testing on a solaris_sparc machine.

Thank you!

Best regards,


Zoltan


From edward.nevill at gmail.com  Wed Apr 20 17:08:30 2016
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 20 Apr 2016 18:08:30 +0100
Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <57163063.3020506@redhat.com>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<5714D930.4090804@redhat.com>
	<CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>
	<57163063.3020506@redhat.com>
Message-ID: <1461172110.2941.63.camel@mylittlepony.linaroharston>

On Tue, 2016-04-19 at 14:19 +0100, Andrew Haley wrote:
> On 04/19/2016 01:54 PM, Long Chen wrote:
> > Would this be fine?
> 
> It might well be.  I'd like Ed to do a few measurements of large and
> small block zeroing.  My guess is that a reasonably small unrolled loop
> doing  STP ZR, ZR  will work better than anything else, but we'll see.

OK. So I started by doing some basic measurements of how long it takes to clear a cache line on 3 different partners HW using 3 different methods.

1) A sequence of str zr, [base, #N] instructions
2) A sequence of stp zr, zr, [base, #N] instructions
3) Using dc zva

Each test was repeated for 3 different memory sizes, 100 cache lines, 10000 cache lines and 1E7 cache lines to simulate the cases where we are hitting L1, L2 and main memory respectively.

The results are here. I have normalised the time for the 100 cache line str to 100 for each partner to avoid disclosing any absolute performance figures.

http://people.linaro.org/~edward.nevill/block_zero/zva.pdf

>From this I get the following conclusions

Partner X:
- Significant improvement using stp vs str across all block zero sizes
- Significant improvement using dc zva over stp across all sizes
Partner Y:
- Virtually no performance improvement using stp vs str all sizes
- Significant improvement using dc zva
Partner Z:
- Small improvement using stp vs str on L2 sized clears
- Small improvement using dc zva on L1/L2 sizes clears
- Large block zeros show no performance improvement str/stp/dc zva
  (this is probably a feature of the external memory system on the partner Z board)

So, guided by this I modified the block zeroing patch as follows

<zero single word to align base to 128 bit aligned address>
if (!small) {
  <zero remainder of first cache line using unrolled stp>
  <zero cache lines using dc zva>
}
<zero tail using unrolled stp>
<zero final word>

Here is the webrev for this

http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v03/

I also made a minor modifcation to Long Chen's v02 patch. In the following code

+  tbz(cnt, 0, store_pair);
+  str(zr, Address(post(base, 8)));
+  sub(cnt, cnt, 1);
+  bind(store_pair);
+  cbz(cnt, done);
+  bind(loop_store_pair);
+  sub(cnt, cnt, 2);
+  stp(zr, zr, Address(post(base, 16)));
+  cbnz(cnt, loop_store_pair);
+  bind(done);

it unnecessarily misaligns the base before continuing to do the stps. We know the base is aligned in the large case because it has just finished clearing cache lines.

I moved the single word zero to the end. The number of instructions is the same. The webrev for this is here.

http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04

For completeness I also implemented a version using stp only and not using dc zva at all. Webrev here

http://people.linaro.org/~edward.nevill/block_zero/stp

I have tested all of these, including Long Chens v01 and v02 patches using jmh as before (http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java)

Results are here, I have normalised the original value in each case to 1E7uS to avoid disclosing any absolute performance figures.

http://people.linaro.org/~edward.nevill/block_zero/zero.pdf

In this

orig - is a clean jdk9/hs-comp build (results normalised to 1E7uS)
stp - is the stp patch above using only stps (no dc zva)
bzero1 - is Long Chens v01 patch
bzero2 - is Long Chens v02 patch
bzero3 - is my patch above
bzero4 - is Long Chens v02 patch with the minor mod to avoid misaligning the stps

>From this it looks like bzero3 or bzero4 would be the preferred options, and I would suggest bzero4 as bzero3 is significantly larger.

If people are happy could I prepare final changeset for review based on bzero4 (ie this one)

http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04

All the best,
Ed.


From vladimir.kozlov at oracle.com  Wed Apr 20 17:58:45 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 20 Apr 2016 10:58:45 -0700
Subject: [8u] RFR 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <57165F09.5050404@oracle.com>
References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com>
	<57165F09.5050404@oracle.com>
Message-ID: <5717C355.9080509@oracle.com>

Reviewed. Looks good.

Thanks,
Vladimir

On 4/19/16 9:38 AM, Vladimir Kempik wrote:
> Hello
>
> Can I get some jdk8u reviewer to take a look at it as well?
>
> Thanks, Vladimir.
>
> On 11.04.2016 15:26, Tobias Hartmann wrote:
>> Hi Vladimir,
>>
>> On 11.04.2016 14:00, Vladimir Kempik wrote:
>>> Hello
>>>
>>> Please review this backport of 8130309 to jdk8u.
>>>
>>> Small changes for jdk8 were applied. AArch64 changes were moved out
>>> of openjdk scope.
>>>
>>> Testing: jprt, failing test.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309
>>> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/
>> Looks good to me. Thanks for backporting this!
>>
>> Best regards,
>> Tobias
>>
>>> Thanks
>>> -Vladimir
>>>
>

From vladimir.kozlov at oracle.com  Wed Apr 20 18:01:21 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 20 Apr 2016 11:01:21 -0700
Subject: [9] RFR(XS): 8153292:
	AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger
	out-of-heap prefetching
In-Reply-To: <571799BE.1030203@oracle.com>
References: <571799BE.1030203@oracle.com>
Message-ID: <5717C3F1.6030809@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/20/16 8:01 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the patch for 8153292.
>
> https://bugs.openjdk.java.net/browse/JDK-8153292
>
>
> Problem: To avoid out-of-heap accesses by instructions prefetching data,
> TLABs have a reserved area. The size of that area is supposed to be
> large enough to accommodate possible prefetching.
>
> The amount of prefetched data is controlled separately for instance and
> array allocations (by the AllocateInstancePrefetchLines and
> AllocatePrefetchLines flags). The size of the reserved area in the TLAB
> is, however, determined only based on AllocatePrefetchLines. As a
> result, AllocateInstancePrefetchLines > AllocatePrefetchLines can
> trigger out-of-heap memory accesses.
>
>
> Solution: Set the size of the reserved TLAB area to the MAX of both flags.
>
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/
>
> Testing:
> - JPRT;
> - local testing on a solaris_sparc machine.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>

From vladimir.kozlov at oracle.com  Wed Apr 20 18:37:55 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 20 Apr 2016 11:37:55 -0700
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
Message-ID: <5717CC83.3070401@oracle.com>

Looks good to me. I submitted testing on all platforms before integrating.

Thanks,
Vladimir

On 4/20/16 3:11 AM, Civlin, Jan wrote:
> Vladimir,
>
> Please look at the updated patch at
> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>
> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>
> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>
> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>
> Thank you,
>
> J
>
> [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
> provider = SUN
> algorithm = SHA-256
> msgSize = 1024 bytes
> offset = 0
> iters = 10000000
> warmupIters = 20000
> hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
> TestSHA runtime = 28.756324129 seconds
> TestSHA throughput = 356.09558280340946 MB/s
>
> [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
> provider = SUN
> algorithm = SHA-256
> msgSize = 1024 bytes
> offset = 0
> iters = 10000000
> warmupIters = 20000
> hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
> TestSHA runtime = 28.912701124 seconds
> TestSHA throughput = 354.1696071938408 MB/s
>
> [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
> provider = SUN
> algorithm = SHA-256
> msgSize = 1024 bytes
> offset = 0
> iters = 10000000
> warmupIters = 20000
> hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
> TestSHA runtime = 29.339789962 seconds
> TestSHA throughput = 349.01408678325697 MB/s
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Monday, April 18, 2016 5:09 PM
> To: Civlin, Jan; hotspot compiler
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)
>
> Hi Jan,
>
> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>
> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>
> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>
> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>
> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>
> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>
> Thanks,
> Vladimir
>
> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>> == Correction in the subject line ===
>>
>> We would like to contribute the SHA256 AVX2 intrinsic.
>>
>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>
>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>
>> Contributor: Jan Civlin.
>>
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>

From jan.civlin at intel.com  Wed Apr 20 19:07:28 2016
From: jan.civlin at intel.com (Civlin, Jan)
Date: Wed, 20 Apr 2016 19:07:28 +0000
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <5717CC83.3070401@oracle.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
	<5717CC83.3070401@oracle.com>
Message-ID: <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>

Thank you!

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, April 20, 2016 11:38 AM
To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

Looks good to me. I submitted testing on all platforms before integrating.

Thanks,
Vladimir

On 4/20/16 3:11 AM, Civlin, Jan wrote:
> Vladimir,
>
> Please look at the updated patch at
> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>
> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>
> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>
> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>
> Thank you,
>
> J
>
> [jcivlin at HSW-EP02 TestSHA]$ 
> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java 
> -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 
> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 
> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 
> 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA 
> throughput = 356.09558280340946 MB/s
>
> [jcivlin at HSW-EP02 TestSHA]$ 
> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/ja
> va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 
> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 
> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 
> 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA 
> throughput = 354.1696071938408 MB/s
>
> [jcivlin at HSW-EP02 TestSHA]$ 
> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/ja
> va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 
> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 
> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 
> 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA 
> throughput = 349.01408678325697 MB/s
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Monday, April 18, 2016 5:09 PM
> To: Civlin, Jan; hotspot compiler
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no 
> supports_sha() available)
>
> Hi Jan,
>
> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>
> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>
> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>
> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>
> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>
> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>
> Thanks,
> Vladimir
>
> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>> == Correction in the subject line ===
>>
>> We would like to contribute the SHA256 AVX2 intrinsic.
>>
>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>
>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>
>> Contributor: Jan Civlin.
>>
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>

From nils.eliasson at oracle.com  Wed Apr 20 19:26:32 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 20 Apr 2016 21:26:32 +0200
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<57162A88.7030608@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com>
Message-ID: <5717D7E8.5000108@oracle.com>

In vmSymbols.cpp together with the other flag checks.

Regards,
Nils

On 2016-04-20 02:44, Deshpande, Vivek R wrote:
>
> HI Nils
>
> Yes you are right the function accesses the command line flag 
> DisableIntrinsic and changes are static.
>
> Could you point me the right location for the function ?
>
> Also I have updated the webrev with rest of the comments here:
>
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/
>
> Regards,
>
> Vivek
>
> *From:*hotspot-compiler-dev 
> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of 
> *Nils Eliasson
> *Sent:* Tuesday, April 19, 2016 5:55 AM
> *To:* hotspot-compiler-dev at openjdk.java.net
> *Subject:* Re: RFR (S): 8154473: Update for CompilerDirectives to 
> control stub generation and intrinsics
>
> Hi Vivek,
>
> The changes in is_intrinsic_disabled in compilerDirectives.* are 
> static and only access the command line flag DisableIntrinsics. As 
> long as stubs are only generated during startup and don't have a 
> method context - that is ok - but it doesn't belong in the 
> compilerDirectives-files if it doens't use directives.
>
> Regards,
> Nils
>
> On 2016-04-18 19:38, Deshpande, Vivek R wrote:
>
>     Hi all
>
>     I would like to contribute a patch which helps to control the
>     intrinsics in interpreter, c1 and c2 by disabling the stub generation.
>
>     This uses -XX:DisableIntrinsic option to achieve the same.
>
>     Could you please review and sponsor this patch.
>
>     Bug-id:
>
>     https://bugs.openjdk.java.net/browse/JDK-8154473
>     webrev:
>
>     http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/
>     <http://cr.openjdk.java.net/%7Evdeshpande/CompilerDirectives/8154473/webrev.00/>
>
>     Thanks and regards,
>
>     Vivek
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160420/de0cf911/attachment.html>

From dmitry.dmitriev at oracle.com  Wed Apr 20 19:51:24 2016
From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev)
Date: Wed, 20 Apr 2016 22:51:24 +0300
Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler"
	options
In-Reply-To: <5717884D.2020108@oracle.com>
References: <5717884D.2020108@oracle.com>
Message-ID: <5717DDBC.4030909@oracle.com>

Hi Tobias,

Can comment only about new test: I think that you don't need @library 
and @modules for this simple test. Not need a new webrev for that. Thank 
you!

Dmitry

On 20.04.2016 16:46, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch:
>
> https://bugs.openjdk.java.net/browse/JDK-8086068
> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/
>
> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line.
>
> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag.
>
> Tested with regression test and RBT (running).
>
> Thanks,
> Tobias


From nils.eliasson at oracle.com  Wed Apr 20 19:56:54 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 20 Apr 2016 21:56:54 +0200
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
In-Reply-To: <571519BE.605@oracle.com>
References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com>
	<5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com>
Message-ID: <5717DF06.3090305@oracle.com>

Hi,

Thanks for the help,

I got it to work, and added NoSafePointVerifiers to make sure I hadn't 
missed anything. Then after many test iterations it failed again. It 
didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get 
a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to 
the temporary buffer too, or relax the tag checks in the xml and accept 
that the output may need to be sorted by writer-thread before use. The 
output looks like:

<writer id=1/>
<print_optoassembly>
...
releases tty when blocking on a safepoint
<writer id=2/>
<print_nmethod>
...
<writer id=1/>         // back again after safepoint writing without 
ttylock now.
</print_optoassembly>  // Here we fail on an assert today when we expect 
a closing print_nmethod tag
<writer id=2/>
</print_nmethod>

This is malformed xml but has enough information to be reconstructed. 
Would this be an acceptable output?

Regards,
Nils


On 2016-04-18 19:30, Vladimir Kozlov wrote:
> tty would have the same problem but it use C_HEAP to allocate:
>
>     defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) 
> defaultStream();
>
> Please, look if you can do something similar.
>
> Thanks,
> Vladimir
>
> On 4/18/16 4:24 AM, Nils Eliasson wrote:
>> Resizeable is better, but then we assert on expanding the stringbuffer
>> while being under a different ResourceMark.
>>
>> Regards,
>> Nils
>>
>> On 2016-04-15 19:44, Vladimir Kozlov wrote:
>>> Use resizable stream:
>>>
>>> stringStream(size_t initial_bufsize = 256);
>>>
>>> 1024 may not be enough.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> Please review this fix of print opto_assembly.
>>>>
>>>> Summary:
>>>> The compilelog can get corrupted and the VM may assert on "failed:
>>>> bad tag in log".
>>>>
>>>> When printing assembly in output.cpp we first take the ttylock, print
>>>> the head and then the method metadata. However the
>>>> metadata printing makes a vm entry and may block for a safepoint and
>>>> will then release the lock
>>>> (break_tty_lock_for_safepoint). After that some of the other compiler
>>>> thread that haven't safepointed will take the lock
>>>> and the broken log will be a fact when the safepoint is over and the
>>>> first thread starts logging again.
>>>>
>>>> Solution:
>>>> Print the method metadata to a temporary buffer, then take the tty 
>>>> lock.
>>>>
>>>> Testing:
>>>> Repro from bug stops failing.
>>>> Running :hotspot_all
>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) 
>>>>
>>>>
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>>>
>>>> Regards,
>>>> Nils Eliasson
>>


From vladimir.kozlov at oracle.com  Wed Apr 20 20:04:20 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 20 Apr 2016 13:04:20 -0700
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
	<5717CC83.3070401@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>
Message-ID: <5717E0C4.8050006@oracle.com>

One thing was caught during build is ',' at the last line of enum:

+  STACK_SIZE = _RSP      + _RSP_SIZE,
+};

Compiler complains about it so I removed it in my local repo.

Vladimir

On 4/20/16 12:07 PM, Civlin, Jan wrote:
> Thank you!
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 20, 2016 11:38 AM
> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)
>
> Looks good to me. I submitted testing on all platforms before integrating.
>
> Thanks,
> Vladimir
>
> On 4/20/16 3:11 AM, Civlin, Jan wrote:
>> Vladimir,
>>
>> Please look at the updated patch at
>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>>
>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>>
>> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>>
>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>>
>> Thank you,
>>
>> J
>>
>> [jcivlin at HSW-EP02 TestSHA]$
>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java
>> -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51
>> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0
>> 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA
>> throughput = 356.09558280340946 MB/s
>>
>> [jcivlin at HSW-EP02 TestSHA]$
>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/ja
>> va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51
>> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0
>> 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA
>> throughput = 354.1696071938408 MB/s
>>
>> [jcivlin at HSW-EP02 TestSHA]$
>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/ja
>> va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51
>> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0
>> 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA
>> throughput = 349.01408678325697 MB/s
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Monday, April 18, 2016 5:09 PM
>> To: Civlin, Jan; hotspot compiler
>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>> supports_sha() available)
>>
>> Hi Jan,
>>
>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>>
>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>>
>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>>
>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>>
>> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>>
>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>>> == Correction in the subject line ===
>>>
>>> We would like to contribute the SHA256 AVX2 intrinsic.
>>>
>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>>
>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>>
>>> Contributor: Jan Civlin.
>>>
>>>
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>>

From vladimir.kozlov at oracle.com  Wed Apr 20 20:07:44 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 20 Apr 2016 13:07:44 -0700
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
In-Reply-To: <5717DF06.3090305@oracle.com>
References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com>
	<5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com>
	<5717DF06.3090305@oracle.com>
Message-ID: <5717E190.5070107@oracle.com>

On 4/20/16 12:56 PM, Nils Eliasson wrote:
> Hi,
>
> Thanks for the help,
>
> I got it to work, and added NoSafePointVerifiers to make sure I hadn't
> missed anything. Then after many test iterations it failed again. It
> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get
> a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to
> the temporary buffer too, or relax the tag checks in the xml and accept
> that the output may need to be sorted by writer-thread before use. The
> output looks like:
>
> <writer id=1/>
> <print_optoassembly>
> ...
> releases tty when blocking on a safepoint
> <writer id=2/>
> <print_nmethod>
> ...
> <writer id=1/>         // back again after safepoint writing without
> ttylock now.
> </print_optoassembly>  // Here we fail on an assert today when we expect
> a closing print_nmethod tag
> <writer id=2/>
> </print_nmethod>
>
> This is malformed xml but has enough information to be reconstructed.
> Would this be an acceptable output?

Yes, I think it is acceptable - we don't loose information. And it is 
not worse than it was before.

Thanks,
Vladimir

>
> Regards,
> Nils
>
>
> On 2016-04-18 19:30, Vladimir Kozlov wrote:
>> tty would have the same problem but it use C_HEAP to allocate:
>>
>>     defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal)
>> defaultStream();
>>
>> Please, look if you can do something similar.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/18/16 4:24 AM, Nils Eliasson wrote:
>>> Resizeable is better, but then we assert on expanding the stringbuffer
>>> while being under a different ResourceMark.
>>>
>>> Regards,
>>> Nils
>>>
>>> On 2016-04-15 19:44, Vladimir Kozlov wrote:
>>>> Use resizable stream:
>>>>
>>>> stringStream(size_t initial_bufsize = 256);
>>>>
>>>> 1024 may not be enough.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>>>>> Hi,
>>>>>
>>>>> Please review this fix of print opto_assembly.
>>>>>
>>>>> Summary:
>>>>> The compilelog can get corrupted and the VM may assert on "failed:
>>>>> bad tag in log".
>>>>>
>>>>> When printing assembly in output.cpp we first take the ttylock, print
>>>>> the head and then the method metadata. However the
>>>>> metadata printing makes a vm entry and may block for a safepoint and
>>>>> will then release the lock
>>>>> (break_tty_lock_for_safepoint). After that some of the other compiler
>>>>> thread that haven't safepointed will take the lock
>>>>> and the broken log will be a fact when the safepoint is over and the
>>>>> first thread starts logging again.
>>>>>
>>>>> Solution:
>>>>> Print the method metadata to a temporary buffer, then take the tty
>>>>> lock.
>>>>>
>>>>> Testing:
>>>>> Repro from bug stops failing.
>>>>> Running :hotspot_all
>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854)
>>>>>
>>>>>
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>>>>
>>>>> Regards,
>>>>> Nils Eliasson
>>>
>

From jan.civlin at intel.com  Wed Apr 20 20:13:37 2016
From: jan.civlin at intel.com (Civlin, Jan)
Date: Wed, 20 Apr 2016 20:13:37 +0000
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <5717E0C4.8050006@oracle.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
	<5717CC83.3070401@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>
	<5717E0C4.8050006@oracle.com>
Message-ID: <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com>

Thank you, Vladimir.

I guess it was a warning. 
I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new.


Section 6.7.2.2 of C99 lists the syntax as:

enum-specifier:
    enum identifieropt { enumerator-list }
    enum identifieropt { enumerator-list , }
    enum identifier
enumerator-list:
    enumerator
    enumerator-list , enumerator
enumerator:
    enumeration-constant
    enumeration-constant = constant-expression


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, April 20, 2016 1:04 PM
To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

One thing was caught during build is ',' at the last line of enum:

+  STACK_SIZE = _RSP      + _RSP_SIZE,
+};

Compiler complains about it so I removed it in my local repo.

Vladimir

On 4/20/16 12:07 PM, Civlin, Jan wrote:
> Thank you!
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 20, 2016 11:38 AM
> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no 
> supports_sha() available)
>
> Looks good to me. I submitted testing on all platforms before integrating.
>
> Thanks,
> Vladimir
>
> On 4/20/16 3:11 AM, Civlin, Jan wrote:
>> Vladimir,
>>
>> Please look at the updated patch at
>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>>
>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>>
>> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>>
>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>>
>> Thank you,
>>
>> J
>>
>> [jcivlin at HSW-EP02 TestSHA]$
>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/jav
>> a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 
>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af 
>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA 
>> throughput = 356.09558280340946 MB/s
>>
>> [jcivlin at HSW-EP02 TestSHA]$
>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/j
>> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 
>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af 
>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA 
>> throughput = 354.1696071938408 MB/s
>>
>> [jcivlin at HSW-EP02 TestSHA]$
>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/j
>> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 
>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af 
>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA 
>> throughput = 349.01408678325697 MB/s
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Monday, April 18, 2016 5:09 PM
>> To: Civlin, Jan; hotspot compiler
>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>> supports_sha() available)
>>
>> Hi Jan,
>>
>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>>
>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>>
>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>>
>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>>
>> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>>
>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>>> == Correction in the subject line ===
>>>
>>> We would like to contribute the SHA256 AVX2 intrinsic.
>>>
>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>>
>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>>
>>> Contributor: Jan Civlin.
>>>
>>>
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>>

From vladimir.kozlov at oracle.com  Wed Apr 20 20:17:15 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 20 Apr 2016 13:17:15 -0700
Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler"
	options
In-Reply-To: <5717909F.5040004@oracle.com>
References: <5717884D.2020108@oracle.com> <57178C03.1010902@oracle.com>
	<5717909F.5040004@oracle.com>
Message-ID: <5717E3CB.6060107@oracle.com>

An other interesting combination is -Xshare:dump -Xcomp (or 
-XX:+UseCompiler) because -Xshare:dump tries to disable compilation.

I think the fix is good for -Xint -XX:+UseCompiler combination.

Thanks,
Vladimir

On 4/20/16 7:22 AM, Tobias Hartmann wrote:
> Hi Zoltan,
>
> On 20.04.2016 16:02, Zolt?n Maj? wrote:
>> There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems?
>
> I checked and combining them with -Xint does not cause any problems because they are guarded by UseCompiler.
>
>> Otherwise it looks good to me.
>
> Thanks for the review!
>
> Best regards,
> Tobias
>
>> Best regards,
>>
>>
>> Zoltan
>>
>> On 04/20/2016 03:46 PM, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8086068
>>> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/
>>>
>>> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line.
>>>
>>> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag.
>>>
>>> Tested with regression test and RBT (running).
>>>
>>> Thanks,
>>> Tobias
>>

From christian.thalinger at oracle.com  Wed Apr 20 21:47:52 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 20 Apr 2016 11:47:52 -1000
Subject: RFR(S): 8153013: BlockingCompilation test times out
In-Reply-To: <571733C9.5090302@oracle.com>
References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com>
	<570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com>
	<570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com>
	<570F905A.4050202@oracle.com>
	<CCC9CCE9-1183-44DE-AA06-827F22B584E1@oracle.com>
	<5710E772.5050801@oracle.com>
	<B3A4C7A2-E9B6-403A-BF0A-8B64049F4338@oracle.com>
	<5714B5CB.70705@oracle.com> <57166728.4060906@oracle.com>
	<31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com>
	<571733C9.5090302@oracle.com>
Message-ID: <D2E9BD08-E925-4EDF-9BC7-1A9FAFF610BE@oracle.com>

Looks good.

> On Apr 19, 2016, at 9:46 PM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> 
> 
> On 2016-04-19 19:37, Christian Thalinger wrote:
>> 
>>> On Apr 19, 2016, at 7:13 AM, Nils Eliasson < <mailto:nils.eliasson at oracle.com>nils.eliasson at oracle.com <mailto:nils.eliasson at oracle.com>> wrote:
>>> 
>>> 
>>> 
>>> On 2016-04-18 12:24, Nils Eliasson wrote:
>>>> Hi,
>>>> 
>>>> On 2016-04-15 22:43, Christian Thalinger wrote:
>>>>> 
>>>>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson <nils.eliasson at oracle.com <mailto:nils.eliasson at oracle.com>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> On 2016-04-14 20:45, Christian Thalinger wrote:
>>>>>>> 
>>>>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson < <mailto:nils.eliasson at oracle.com>nils.eliasson at oracle.com <mailto:nils.eliasson at oracle.com>> wrote:
>>>>>>>> 
>>>>>>>> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested.
>>>>>>>> 
>>>>>>>> It gets verbose in the method declarations in compileBroker
>>>>>>> 
>>>>>>> Don?t worry about this.
>>>>>>> 
>>>>>>>> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too.
>>>>>>> 
>>>>>>> Yes, that?s the right place.
>>>>>>> 
>>>>>>>> 
>>>>>>>> New webrev:
>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.03/>
>>>>>>> 
>>>>>>> +   bool         can_become_stale() const          {
>>>>>>> +     return !_is_blocking && (_compile_reason < Reason_Whitebox);
>>>>>>> +   }
>>>>>>> I?m not a fan of implicit contracts just defined by comments.  This method doesn?t seem to be performance critical so I would suggest to use a switch-case.  An attribute on the enum would be much better but we all know this isn?t Java.
>>>>>> 
>>>>>> As you suggested:
>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.04>
>>>>> 
>>>>> Thanks.  A space is missing and the closing } indent is wrong:
>>>>> +   bool         can_become_stale() const          {
>>>>> +     switch(_compile_reason) {
>>>>> +       case Reason_BackedgeCount:
>>>>> +       case Reason_InvocationCount:
>>>>> +       case Reason_Tiered:
>>>>> +         return !_is_blocking;
>>>>> +       }
>>>>> +     return false;
>>>>> +   }
>>> And I fixed the indentation.
>>> 
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.05/>
>> 
>> +     switch(_compile_reason) {
>> Space after switch.
> 
> New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.06/ <http://cr.openjdk.java.net/~neliasso/8153013/webrev.06/>
> 
> Thanks,
> Nils
> 
>> 
>>> 
>>> Thanks!
>>> Nils
>>>>> Also, what about:
>>>>> +       Reason_None,
>>>>> +       Reason_CTW,              // Compile the world
>>>>> +       Reason_Replay,           // ciReplay
>>>>> These were covered before.
>>>> Reason_None - is only used for bounds checking together with Reason_Count.
>>>> Reason_Replay - if these compilations can get stale we can get indeterminism in replay.
>>>> Reason_CTW - CTW could silently drop compiles -> more indeterminism.
>>>> 
>>>> Regards,
>>>> Nils
>>>> 
>>>>> 
>>>>>> 
>>>>>> Also made reasons CTW and Replay not stale-able. 
>>>>>> 
>>>>>> Thanks!
>>>>>> Nils
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> Nils
>>>>>>>> 
>>>>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote:
>>>>>>>>> Very nice, I like it.
>>>>>>>>> 
>>>>>>>>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>> 
>>>>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> New webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.02/>
>>>>>>>>>> 
>>>>>>>>>> Summary
>>>>>>>>>> Introduced an enum CompileReason with members matching all the old
>>>>>>>>>> variants, and a table containing all the unchanged strings. I see the
>>>>>>>>>> possibility of removing/changing/simplifying some CompileReasons but
>>>>>>>>>> have choosen not to do so in this change.
>>>>>>>>>> 
>>>>>>>>>> Only new logic is the CompileTask::can_become_stale() method.
>>>>>>>>>> 
>>>>>>>>>> Testing:
>>>>>>>>>> Running Testset hotspot on all platforms and hotspot_all on one platform
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Nils Eliawsson
>>>>>>>>>> 
>>>>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote:
>>>>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote:
>>>>>>>>>>>> Tasks get evicted from the compile_queue if their invocation counter
>>>>>>>>>>>> hasn't increased during TieredCompileTaskTimeout.
>>>>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)).
>>>>>>>>>>>> 
>>>>>>>>>>>> I'll do a proper fix, it is the right thing to do and should be pretty
>>>>>>>>>>>> quick. I'll change the comment to an enum that represent who submitted
>>>>>>>>>>>> the compile, and add a table for the comments. This could be useful in
>>>>>>>>>>>> other settings to.
>>>>>>>>>>> 
>>>>>>>>>>> Sounds good.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Nils
>>>>>>>>>>>> 
>>>>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote:
>>>>>>>>>>>>> What do you mean "stale"?
>>>>>>>>>>>>> I would prefer to see the real fix as you suggested to avoid removing
>>>>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Please review this small fix of the BlockingCompilation test.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>> Add method enqueued for compilation with WB API may be removed from
>>>>>>>>>>>>>> the compile queue as stale.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Solution:
>>>>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets
>>>>>>>>>>>>>> stale while the test is running. (Also added some extra
>>>>>>>>>>>>>> checks that may spare us from waiting until timeout for failing.)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is an workaround but we should consider fixing something
>>>>>>>>>>>>>> permanent for WB API compiles - like tagging the compile
>>>>>>>>>>>>>> task with info about the origin of the compile. The comment field has
>>>>>>>>>>>>>> this information - but then it needs to be
>>>>>>>>>>>>>> converted to an enum.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Bug:  <https://bugs.openjdk.java.net/browse/JDK-8153013>https://bugs.openjdk.java.net/browse/JDK-8153013 <https://bugs.openjdk.java.net/browse/JDK-8153013>
>>>>>>>>>>>>>> Webrev:  <http://cr.openjdk.java.net/%7Eneliasso/8153013/webrev.01/>http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ <http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Nils Eliasson
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160420/f2e3735a/attachment-0001.html>

From vladimir.kozlov at oracle.com  Wed Apr 20 22:51:29 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 20 Apr 2016 15:51:29 -0700
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
	<5717CC83.3070401@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>
	<5717E0C4.8050006@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com>
Message-ID: <571807F1.10105@oracle.com>

Testing is continued but it found next problem already when running 
tests with -XX:UseSSE=2:

#  Internal Error 
(/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.cpp:3693), 
pid=52652, tid=3587
#  Error: assert(VM_Version::supports_ssse3()) failed

V  [libjvm.dylib+0x4193d7]  report_vm_error(char const*, int, char 
const*, char const*, ...)+0xcd
V  [libjvm.dylib+0x1eedd2]  Assembler::vpshufb(XMMRegisterImpl*, 
XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e
V  [libjvm.dylib+0x87c237] 
MacroAssembler::sha256_AVX2(XMMRegisterImpl*, XMMRegisterImpl*, 
XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, 
XMMRegisterImpl*, XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, 
RegisterImpl*, RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297
V  [libjvm.dylib+0xa4dc47] 
StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b


Vladimir

On 4/20/16 1:13 PM, Civlin, Jan wrote:
> Thank you, Vladimir.
>
> I guess it was a warning.
> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new.
>
>
> Section 6.7.2.2 of C99 lists the syntax as:
>
> enum-specifier:
>      enum identifieropt { enumerator-list }
>      enum identifieropt { enumerator-list , }
>      enum identifier
> enumerator-list:
>      enumerator
>      enumerator-list , enumerator
> enumerator:
>      enumeration-constant
>      enumeration-constant = constant-expression
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 20, 2016 1:04 PM
> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)
>
> One thing was caught during build is ',' at the last line of enum:
>
> +  STACK_SIZE = _RSP      + _RSP_SIZE,
> +};
>
> Compiler complains about it so I removed it in my local repo.
>
> Vladimir
>
> On 4/20/16 12:07 PM, Civlin, Jan wrote:
>> Thank you!
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, April 20, 2016 11:38 AM
>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>> supports_sha() available)
>>
>> Looks good to me. I submitted testing on all platforms before integrating.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/20/16 3:11 AM, Civlin, Jan wrote:
>>> Vladimir,
>>>
>>> Please look at the updated patch at
>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>>>
>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>>>
>>> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>>>
>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>>>
>>> Thank you,
>>>
>>> J
>>>
>>> [jcivlin at HSW-EP02 TestSHA]$
>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/jav
>>> a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA
>>> throughput = 356.09558280340946 MB/s
>>>
>>> [jcivlin at HSW-EP02 TestSHA]$
>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/j
>>> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA
>>> throughput = 354.1696071938408 MB/s
>>>
>>> [jcivlin at HSW-EP02 TestSHA]$
>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/j
>>> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA
>>> throughput = 349.01408678325697 MB/s
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Monday, April 18, 2016 5:09 PM
>>> To: Civlin, Jan; hotspot compiler
>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>> supports_sha() available)
>>>
>>> Hi Jan,
>>>
>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>>>
>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>>>
>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>>>
>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>>>
>>> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>>>
>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>>>> == Correction in the subject line ===
>>>>
>>>> We would like to contribute the SHA256 AVX2 intrinsic.
>>>>
>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>>>
>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>>>
>>>> Contributor: Jan Civlin.
>>>>
>>>>
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>>>

From vivek.r.deshpande at intel.com  Thu Apr 21 00:06:48 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Thu, 21 Apr 2016 00:06:48 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <5717D7E8.5000108@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<57162A88.7030608@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com>
	<5717D7E8.5000108@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A84830@ORSMSX106.amr.corp.intel.com>

Hi Nils

I have updated the webrev with all the suggestions.
updated webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/
Thanks for your comments and review.

@Vladimir,
I have taken care of all the comments. Would you please review and sponsor the patch.

Thanks and regards,
Vivek

From: Nils Eliasson [mailto:nils.eliasson at oracle.com]
Sent: Wednesday, April 20, 2016 12:27 PM
To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net
Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

In vmSymbols.cpp together with the other flag checks.

Regards,
Nils
On 2016-04-20 02:44, Deshpande, Vivek R wrote:
HI Nils

Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static.
Could you point me the right location for the function ?
Also I have updated the webrev with rest of the comments here:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/

Regards,
Vivek

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson
Sent: Tuesday, April 19, 2016 5:55 AM
To: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Vivek,

The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives.

Regards,
Nils
On 2016-04-18 19:38, Deshpande, Vivek R wrote:
Hi all

I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
This uses -XX:DisableIntrinsic option to achieve the same.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154473
webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/<http://cr.openjdk.java.net/%7Evdeshpande/CompilerDirectives/8154473/webrev.00/>

Thanks and regards,
Vivek


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160421/67b637da/attachment.html>

From vivek.r.deshpande at intel.com  Thu Apr 21 00:09:54 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Thu, 21 Apr 2016 00:09:54 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<57162A88.7030608@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com>
	<5717D7E8.5000108@oracle.com> 
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A84860@ORSMSX106.amr.corp.intel.com>

Sent out the wrong link by mistake.

updated webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/<http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/>

Regards
Vivek


From: Deshpande, Vivek R
Sent: Wednesday, April 20, 2016 5:07 PM
To: 'Nils Eliasson'; hotspot-compiler-dev at openjdk.java.net
Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Nils

I have updated the webrev with all the suggestions.
updated webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/
Thanks for your comments and review.

@Vladimir,
I have taken care of all the comments. Would you please review and sponsor the patch.

Thanks and regards,
Vivek

From: Nils Eliasson [mailto:nils.eliasson at oracle.com]
Sent: Wednesday, April 20, 2016 12:27 PM
To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

In vmSymbols.cpp together with the other flag checks.

Regards,
Nils
On 2016-04-20 02:44, Deshpande, Vivek R wrote:
HI Nils

Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static.
Could you point me the right location for the function ?
Also I have updated the webrev with rest of the comments here:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/

Regards,
Vivek

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson
Sent: Tuesday, April 19, 2016 5:55 AM
To: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Vivek,

The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives.

Regards,
Nils
On 2016-04-18 19:38, Deshpande, Vivek R wrote:
Hi all

I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
This uses -XX:DisableIntrinsic option to achieve the same.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154473
webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/<http://cr.openjdk.java.net/%7Evdeshpande/CompilerDirectives/8154473/webrev.00/>

Thanks and regards,
Vivek


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160421/1dbf7fc5/attachment-0001.html>

From vivek.r.deshpande at intel.com  Thu Apr 21 00:13:29 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Thu, 21 Apr 2016 00:13:29 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<57162A88.7030608@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com>
	<5717D7E8.5000108@oracle.com>  
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A84884@ORSMSX106.amr.corp.intel.com>

Hi

The correct URL for the updated webrev is this.
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/
Sorry for the spam.

Regards,
Vivek

From: Deshpande, Vivek R
Sent: Wednesday, April 20, 2016 5:10 PM
To: Deshpande, Vivek R; 'Nils Eliasson'; 'hotspot-compiler-dev at openjdk.java.net'
Cc: 'Vladimir Kozlov'; 'Volker Simonis'; 'Christian Thalinger'; Viswanathan, Sandhya
Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Sent out the wrong link by mistake.

updated webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/<http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/>

Regards
Vivek


From: Deshpande, Vivek R
Sent: Wednesday, April 20, 2016 5:07 PM
To: 'Nils Eliasson'; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Nils

I have updated the webrev with all the suggestions.
updated webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/
Thanks for your comments and review.

@Vladimir,
I have taken care of all the comments. Would you please review and sponsor the patch.

Thanks and regards,
Vivek

From: Nils Eliasson [mailto:nils.eliasson at oracle.com]
Sent: Wednesday, April 20, 2016 12:27 PM
To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

In vmSymbols.cpp together with the other flag checks.

Regards,
Nils
On 2016-04-20 02:44, Deshpande, Vivek R wrote:
HI Nils

Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static.
Could you point me the right location for the function ?
Also I have updated the webrev with rest of the comments here:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/

Regards,
Vivek

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson
Sent: Tuesday, April 19, 2016 5:55 AM
To: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Vivek,

The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives.

Regards,
Nils
On 2016-04-18 19:38, Deshpande, Vivek R wrote:
Hi all

I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
This uses -XX:DisableIntrinsic option to achieve the same.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154473
webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/<http://cr.openjdk.java.net/%7Evdeshpande/CompilerDirectives/8154473/webrev.00/>

Thanks and regards,
Vivek


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160421/753b581d/attachment.html>

From tobias.hartmann at oracle.com  Thu Apr 21 06:50:32 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Apr 2016 08:50:32 +0200
Subject: [9] RFR(XS): 8153292:
	AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger
	out-of-heap prefetching
In-Reply-To: <571799BE.1030203@oracle.com>
References: <571799BE.1030203@oracle.com>
Message-ID: <57187838.5070607@oracle.com>

Hi Zoltan,

looks good to me!

Best regards,
Tobias

On 20.04.2016 17:01, Zolt?n Maj? wrote:
> Hi,
> 
> 
> please review the patch for 8153292.
> 
> https://bugs.openjdk.java.net/browse/JDK-8153292
> 
> 
> Problem: To avoid out-of-heap accesses by instructions prefetching data, TLABs have a reserved area. The size of that area is supposed to be large enough to accommodate possible prefetching.
> 
> The amount of prefetched data is controlled separately for instance and array allocations (by the AllocateInstancePrefetchLines and AllocatePrefetchLines flags). The size of the reserved area in the TLAB is, however, determined only based on AllocatePrefetchLines. As a result, AllocateInstancePrefetchLines > AllocatePrefetchLines can trigger out-of-heap memory accesses.
> 
> 
> Solution: Set the size of the reserved TLAB area to the MAX of both flags.
> 
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/
> 
> Testing:
> - JPRT;
> - local testing on a solaris_sparc machine.
> 
> Thank you!
> 
> Best regards,
> 
> 
> Zoltan
> 

From zoltan.majo at oracle.com  Thu Apr 21 07:26:39 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 21 Apr 2016 09:26:39 +0200
Subject: [9] RFR(XS): 8153292:
	AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger
	out-of-heap prefetching
In-Reply-To: <57187838.5070607@oracle.com>
References: <571799BE.1030203@oracle.com> <57187838.5070607@oracle.com>
Message-ID: <571880AF.2010608@oracle.com>

Thank you, Vladimir and Tobias, for the review!

Best regards,


Zoltan

On 04/21/2016 08:50 AM, Tobias Hartmann wrote:
> Hi Zoltan,
>
> looks good to me!
>
> Best regards,
> Tobias
>
> On 20.04.2016 17:01, Zolt?n Maj? wrote:
>> Hi,
>>
>>
>> please review the patch for 8153292.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153292
>>
>>
>> Problem: To avoid out-of-heap accesses by instructions prefetching data, TLABs have a reserved area. The size of that area is supposed to be large enough to accommodate possible prefetching.
>>
>> The amount of prefetched data is controlled separately for instance and array allocations (by the AllocateInstancePrefetchLines and AllocatePrefetchLines flags). The size of the reserved area in the TLAB is, however, determined only based on AllocatePrefetchLines. As a result, AllocateInstancePrefetchLines > AllocatePrefetchLines can trigger out-of-heap memory accesses.
>>
>>
>> Solution: Set the size of the reserved TLAB area to the MAX of both flags.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/
>>
>> Testing:
>> - JPRT;
>> - local testing on a solaris_sparc machine.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>


From tobias.hartmann at oracle.com  Thu Apr 21 07:34:21 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Apr 2016 09:34:21 +0200
Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler"
	options
In-Reply-To: <5717DDBC.4030909@oracle.com>
References: <5717884D.2020108@oracle.com> <5717DDBC.4030909@oracle.com>
Message-ID: <5718827D.5070200@oracle.com>

Hi Dmitry,

On 20.04.2016 21:51, Dmitry Dmitriev wrote:
> Hi Tobias,
> 
> Can comment only about new test: I think that you don't need @library and @modules for this simple test. Not need a new webrev for that. Thank you!

Right, I will remove the tags before pushing. Thanks for the review!

Best regards,
Tobias

> Dmitry
> 
> On 20.04.2016 16:46, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8086068
>> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/
>>
>> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line.
>>
>> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag.
>>
>> Tested with regression test and RBT (running).
>>
>> Thanks,
>> Tobias
> 

From tobias.hartmann at oracle.com  Thu Apr 21 07:35:17 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Apr 2016 09:35:17 +0200
Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler"
	options
In-Reply-To: <5717E3CB.6060107@oracle.com>
References: <5717884D.2020108@oracle.com> <57178C03.1010902@oracle.com>
	<5717909F.5040004@oracle.com> <5717E3CB.6060107@oracle.com>
Message-ID: <571882B5.2090706@oracle.com>

Hi Vladimir,

thanks for the review!

On 20.04.2016 22:17, Vladimir Kozlov wrote:
> An other interesting combination is -Xshare:dump -Xcomp (or -XX:+UseCompiler) because -Xshare:dump tries to disable compilation.

With "-Xshare:dump -XX:+UseCompiler", UseCompiler is disabled by this fix.
With "-Xshare:dump -Xcomp", UseCompiler is enabled but no compilations are triggered because DumpSharedSpaces is set and no bytecode is executed.

I checked several other combinations and did not encounter any problems.
 
> I think the fix is good for -Xint -XX:+UseCompiler combination.

Okay, I'm going to push this with the changes Dmitry suggested.

Thanks,
Tobias

> 
> Thanks,
> Vladimir
> 
> On 4/20/16 7:22 AM, Tobias Hartmann wrote:
>> Hi Zoltan,
>>
>> On 20.04.2016 16:02, Zolt?n Maj? wrote:
>>> There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems?
>>
>> I checked and combining them with -Xint does not cause any problems because they are guarded by UseCompiler.
>>
>>> Otherwise it looks good to me.
>>
>> Thanks for the review!
>>
>> Best regards,
>> Tobias
>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>> On 04/20/2016 03:46 PM, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> please review the following patch:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8086068
>>>> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/
>>>>
>>>> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line.
>>>>
>>>> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag.
>>>>
>>>> Tested with regression test and RBT (running).
>>>>
>>>> Thanks,
>>>> Tobias
>>>

From adinn at redhat.com  Thu Apr 21 07:43:41 2016
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 21 Apr 2016 08:43:41 +0100
Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <1461172110.2941.63.camel@mylittlepony.linaroharston>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<5714D930.4090804@redhat.com>
	<CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>
	<57163063.3020506@redhat.com>
	<1461172110.2941.63.camel@mylittlepony.linaroharston>
Message-ID: <571884AD.1030906@redhat.com>

On 20/04/16 18:08, Edward Nevill wrote:
> If people are happy could I prepare final changeset for review based on bzero4 (ie this one)
> 
> http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04

Yeah, science rocks!

bzero4 is it for me but you need a nod from the boss.

regards,


Andrew Dinn
-----------


From rwestrel at redhat.com  Thu Apr 21 08:23:31 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 21 Apr 2016 10:23:31 +0200
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
Message-ID: <57188E03.5070303@redhat.com>

http://cr.openjdk.java.net/~roland/8154826/webrev.00/

The aarch64 port implicitly transforms:
(AddP base (AddP base address (LShiftL index con)) offset)
into:
(AddP base (AddP base offset) (LShiftL index con))
in the ad file to embed the shift (and possibly and i2l conversion) into
the addressing mode of a memory operation. Exposing that transformation
in the ideal graph allows:

- (AddP base offset) to be scheduled (for instance outside a loop)
- multiple identical (AddP base offset) to be commoned
- (LShiftL index con) to be cloned during matching so that each memory
access has its own

Roland.

From tobias.hartmann at oracle.com  Thu Apr 21 10:11:03 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Apr 2016 12:11:03 +0200
Subject: [9] RFR(XS): 8086057: Crash with "modified node is not on
	IGVN._worklist" when running with -XX:-SplitIfBlocks
Message-ID: <5718A737.1050708@oracle.com>

Hi,

please review the following patch:

https://bugs.openjdk.java.net/browse/JDK-8086057
http://cr.openjdk.java.net/~thartmann/8086057/webrev.00/

The fastdebug VM crashes with "modified node is not on IGVN._worklist" because SuperWord::align_initial_loop_index() resets the input of the pre-loop Opaque1 node 'pre_opaq' without putting it on the IGVN worklist afterwards. This only shows up with -XX:-SplitIfBlocks because otherwise surrounding nodes are changed causing the OpagueNode to be put on the worklist. The method should use PhaseIterGVN::replace_input_of() which uses rehash_node_delayed() to ensure the modified node is put on the worklist.

Tested with regression test, Nashorn+Octane with -XX:-SplitIfBlocks and RBT (running).

Thanks,
Tobias

From zoltan.majo at oracle.com  Thu Apr 21 11:30:04 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 21 Apr 2016 13:30:04 +0200
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
Message-ID: <5718B9BC.10001@oracle.com>

Hi,


please review the patch for 8153340.

https://bugs.openjdk.java.net/browse/JDK-8153340


Problem: The VM crashes if AllocatePrefetchStyle==3 and 
AllocatePrefetchDistance==0. The crash happens due to the way the 
address for the first prefetch instruction is calculated [1]:

If distance==0, cache_addr == old_eden_top. Then, cache_adr &= 
~(AllocatePrefetchStepSize - 1) which can zero some of the bits of 
cache_adr. That result in accesses *before* the newly allocated object.


Solution: Set lower limit of AllocatePrefetchDistance to 
AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). Unquarantine 
test.

Webrev:
http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/

Testing:
- JPRT (incl. TestOptionsWithRanges.java)
- local testing on a SPARC machine.

Thank you!

Best regards,


Zoltan

[1] 
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941

From tobias.hartmann at oracle.com  Thu Apr 21 13:56:56 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Apr 2016 15:56:56 +0200
Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if
	RangeCheckElimination is disabled
Message-ID: <5718DC28.7080502@oracle.com>

Hi,

please review the following patch:

https://bugs.openjdk.java.net/browse/JDK-8154763
http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/

JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled.

I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check.

Tested with regression test and RBT (running).

Thanks,
Tobias


From michael.c.berg at intel.com  Thu Apr 21 14:42:05 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 21 Apr 2016 14:42:05 +0000
Subject: [9] RFR(XS): 8154763: Crash with
	"assert(RangeCheckElimination)" if RangeCheckElimination is disabled
In-Reply-To: <5718DC28.7080502@oracle.com>
References: <5718DC28.7080502@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E64033@FMSMSX102.amr.corp.intel.com>

Hi Tobias, sure when RangeCheckElimination is not enabled this makes sense.

Thanks,
Michael

-----Original Message-----
From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] 
Sent: Thursday, April 21, 2016 6:57 AM
To: hotspot-compiler-dev at openjdk.java.net
Cc: Berg, Michael C <michael.c.berg at intel.com>
Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled

Hi,

please review the following patch:

https://bugs.openjdk.java.net/browse/JDK-8154763
http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/

JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled.

I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check.

Tested with regression test and RBT (running).

Thanks,
Tobias


From tobias.hartmann at oracle.com  Thu Apr 21 14:44:40 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Apr 2016 16:44:40 +0200
Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)"
	if RangeCheckElimination is disabled
In-Reply-To: <C568518E7B433348B114B6A7122D474756E64033@FMSMSX102.amr.corp.intel.com>
References: <5718DC28.7080502@oracle.com>
	<C568518E7B433348B114B6A7122D474756E64033@FMSMSX102.amr.corp.intel.com>
Message-ID: <5718E758.8040301@oracle.com>

Hi Michael,

On 21.04.2016 16:42, Berg, Michael C wrote:
> Hi Tobias, sure when RangeCheckElimination is not enabled this makes sense.

thanks for the review!

Best regards,
Tobias

> 
> Thanks,
> Michael
> 
> -----Original Message-----
> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] 
> Sent: Thursday, April 21, 2016 6:57 AM
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: Berg, Michael C <michael.c.berg at intel.com>
> Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled
> 
> Hi,
> 
> please review the following patch:
> 
> https://bugs.openjdk.java.net/browse/JDK-8154763
> http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/
> 
> JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled.
> 
> I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check.
> 
> Tested with regression test and RBT (running).
> 
> Thanks,
> Tobias
> 

From jan.civlin at intel.com  Thu Apr 21 18:15:11 2016
From: jan.civlin at intel.com (Civlin, Jan)
Date: Thu, 21 Apr 2016 18:15:11 +0000
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <571807F1.10105@oracle.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
	<5717CC83.3070401@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>
	<5717E0C4.8050006@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com>
	<571807F1.10105@oracle.com>
Message-ID: <39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com>

Vladimir,

I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function.
Please look at 

http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/

Thank you,

J


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, April 20, 2016 3:51 PM
To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

Testing is continued but it found next problem already when running tests with -XX:UseSSE=2:

#  Internal Error
(/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.cpp:3693),
pid=52652, tid=3587
#  Error: assert(VM_Version::supports_ssse3()) failed

V  [libjvm.dylib+0x4193d7]  report_vm_error(char const*, int, char const*, char const*, ...)+0xcd V  [libjvm.dylib+0x1eedd2]  Assembler::vpshufb(XMMRegisterImpl*,
XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V  [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V  [libjvm.dylib+0xa4dc47] StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b


Vladimir

On 4/20/16 1:13 PM, Civlin, Jan wrote:
> Thank you, Vladimir.
>
> I guess it was a warning.
> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new.
>
>
> Section 6.7.2.2 of C99 lists the syntax as:
>
> enum-specifier:
>      enum identifieropt { enumerator-list }
>      enum identifieropt { enumerator-list , }
>      enum identifier
> enumerator-list:
>      enumerator
>      enumerator-list , enumerator
> enumerator:
>      enumeration-constant
>      enumeration-constant = constant-expression
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 20, 2016 1:04 PM
> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no 
> supports_sha() available)
>
> One thing was caught during build is ',' at the last line of enum:
>
> +  STACK_SIZE = _RSP      + _RSP_SIZE,
> +};
>
> Compiler complains about it so I removed it in my local repo.
>
> Vladimir
>
> On 4/20/16 12:07 PM, Civlin, Jan wrote:
>> Thank you!
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, April 20, 2016 11:38 AM
>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>> supports_sha() available)
>>
>> Looks good to me. I submitted testing on all platforms before integrating.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/20/16 3:11 AM, Civlin, Jan wrote:
>>> Vladimir,
>>>
>>> Please look at the updated patch at
>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>>>
>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>>>
>>> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>>>
>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>>>
>>> Thank you,
>>>
>>> J
>>>
>>> [jcivlin at HSW-EP02 TestSHA]$
>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/ja
>>> v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA 
>>> throughput = 356.09558280340946 MB/s
>>>
>>> [jcivlin at HSW-EP02 TestSHA]$
>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/
>>> j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA 
>>> throughput = 354.1696071938408 MB/s
>>>
>>> [jcivlin at HSW-EP02 TestSHA]$
>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/
>>> j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA 
>>> throughput = 349.01408678325697 MB/s
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Monday, April 18, 2016 5:09 PM
>>> To: Civlin, Jan; hotspot compiler
>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>> supports_sha() available)
>>>
>>> Hi Jan,
>>>
>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>>>
>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>>>
>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>>>
>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>>>
>>> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>>>
>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>>>> == Correction in the subject line ===
>>>>
>>>> We would like to contribute the SHA256 AVX2 intrinsic.
>>>>
>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>>>
>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>>>
>>>> Contributor: Jan Civlin.
>>>>
>>>>
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>>>

From vladimir.kozlov at oracle.com  Thu Apr 21 19:30:40 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Apr 2016 12:30:40 -0700
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
	<5717CC83.3070401@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>
	<5717E0C4.8050006@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com>
	<571807F1.10105@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com>
Message-ID: <57192A60.5030206@oracle.com>

Good. But testing found that jrteg SHA tests failed because they don't expect the SHA2 is supported:

"Expected message not found: 'SHA instructions are not available on this CPU'"

hotspot/test/compiler/intrinsics/sha/

Do you know how to run jtreg tests? May be Sandhya or Michael can help.

Thanks,
Vladimir

On 4/21/16 11:15 AM, Civlin, Jan wrote:
> Vladimir,
>
> I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function.
> Please look at
>
> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/
>
> Thank you,
>
> J
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 20, 2016 3:51 PM
> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)
>
> Testing is continued but it found next problem already when running tests with -XX:UseSSE=2:
>
> #  Internal Error
> (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.cpp:3693),
> pid=52652, tid=3587
> #  Error: assert(VM_Version::supports_ssse3()) failed
>
> V  [libjvm.dylib+0x4193d7]  report_vm_error(char const*, int, char const*, char const*, ...)+0xcd V  [libjvm.dylib+0x1eedd2]  Assembler::vpshufb(XMMRegisterImpl*,
> XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V  [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V  [libjvm.dylib+0xa4dc47] StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b
>
>
> Vladimir
>
> On 4/20/16 1:13 PM, Civlin, Jan wrote:
>> Thank you, Vladimir.
>>
>> I guess it was a warning.
>> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new.
>>
>>
>> Section 6.7.2.2 of C99 lists the syntax as:
>>
>> enum-specifier:
>>       enum identifieropt { enumerator-list }
>>       enum identifieropt { enumerator-list , }
>>       enum identifier
>> enumerator-list:
>>       enumerator
>>       enumerator-list , enumerator
>> enumerator:
>>       enumeration-constant
>>       enumeration-constant = constant-expression
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, April 20, 2016 1:04 PM
>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>> supports_sha() available)
>>
>> One thing was caught during build is ',' at the last line of enum:
>>
>> +  STACK_SIZE = _RSP      + _RSP_SIZE,
>> +};
>>
>> Compiler complains about it so I removed it in my local repo.
>>
>> Vladimir
>>
>> On 4/20/16 12:07 PM, Civlin, Jan wrote:
>>> Thank you!
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, April 20, 2016 11:38 AM
>>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>> supports_sha() available)
>>>
>>> Looks good to me. I submitted testing on all platforms before integrating.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/20/16 3:11 AM, Civlin, Jan wrote:
>>>> Vladimir,
>>>>
>>>> Please look at the updated patch at
>>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>>>>
>>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>>>>
>>>> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>>>>
>>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>>>>
>>>> Thank you,
>>>>
>>>> J
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/ja
>>>> v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA
>>>> throughput = 356.09558280340946 MB/s
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/
>>>> j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA
>>>> throughput = 354.1696071938408 MB/s
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/
>>>> j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA
>>>> throughput = 349.01408678325697 MB/s
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Monday, April 18, 2016 5:09 PM
>>>> To: Civlin, Jan; hotspot compiler
>>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>>> supports_sha() available)
>>>>
>>>> Hi Jan,
>>>>
>>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>>>>
>>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>>>>
>>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>>>>
>>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>>>>
>>>> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>>>>
>>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>>>>> == Correction in the subject line ===
>>>>>
>>>>> We would like to contribute the SHA256 AVX2 intrinsic.
>>>>>
>>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>>>>
>>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>>>>
>>>>> Contributor: Jan Civlin.
>>>>>
>>>>>
>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>>>>

From jan.civlin at intel.com  Thu Apr 21 19:58:18 2016
From: jan.civlin at intel.com (Civlin, Jan)
Date: Thu, 21 Apr 2016 19:58:18 +0000
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <57192A60.5030206@oracle.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
	<5717CC83.3070401@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>
	<5717E0C4.8050006@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com>
	<571807F1.10105@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com>
	<57192A60.5030206@oracle.com>
Message-ID: <39F83597C33E5F408096702907E6C4500F16C8DB@ORSMSX104.amr.corp.intel.com>

I know how to run jtreg.
I tested it on TestSHA.jar, admittedly it was rather a standalone run. 

$ /cygdrive/E/Java/sha-041116/build/windows-x86_64-normal-server-fastdebug/jdk/bin/java.exe -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:UseSSE=2 -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
provider = SUN
algorithm = SHA-256
msgSize = 1024 bytes
offset = 0
iters = 10000000
warmupIters = 20000
hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
TestSHA runtime = 33.507065719 seconds
TestSHA throughput = 305.60718404516876 MB/s


jcivlin at JCIVLIN-DESK /cygdrive/C/Java/hotspot/test/civlin/TestSHA/TestSHA/dist
$

I'll take a look what is wrong with jtreg.


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, April 21, 2016 12:31 PM
To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

Good. But testing found that jrteg SHA tests failed because they don't expect the SHA2 is supported:

"Expected message not found: 'SHA instructions are not available on this CPU'"

hotspot/test/compiler/intrinsics/sha/

Do you know how to run jtreg tests? May be Sandhya or Michael can help.

Thanks,
Vladimir

On 4/21/16 11:15 AM, Civlin, Jan wrote:
> Vladimir,
>
> I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function.
> Please look at
>
> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/
>
> Thank you,
>
> J
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 20, 2016 3:51 PM
> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no 
> supports_sha() available)
>
> Testing is continued but it found next problem already when running tests with -XX:UseSSE=2:
>
> #  Internal Error
> (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.
> cpp:3693),
> pid=52652, tid=3587
> #  Error: assert(VM_Version::supports_ssse3()) failed
>
> V  [libjvm.dylib+0x4193d7]  report_vm_error(char const*, int, char 
> const*, char const*, ...)+0xcd V  [libjvm.dylib+0x1eedd2]  
> Assembler::vpshufb(XMMRegisterImpl*,
> XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V  
> [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, 
> XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, 
> XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, 
> XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, 
> RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V  
> [libjvm.dylib+0xa4dc47] 
> StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b
>
>
> Vladimir
>
> On 4/20/16 1:13 PM, Civlin, Jan wrote:
>> Thank you, Vladimir.
>>
>> I guess it was a warning.
>> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new.
>>
>>
>> Section 6.7.2.2 of C99 lists the syntax as:
>>
>> enum-specifier:
>>       enum identifieropt { enumerator-list }
>>       enum identifieropt { enumerator-list , }
>>       enum identifier
>> enumerator-list:
>>       enumerator
>>       enumerator-list , enumerator
>> enumerator:
>>       enumeration-constant
>>       enumeration-constant = constant-expression
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, April 20, 2016 1:04 PM
>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>> supports_sha() available)
>>
>> One thing was caught during build is ',' at the last line of enum:
>>
>> +  STACK_SIZE = _RSP      + _RSP_SIZE,
>> +};
>>
>> Compiler complains about it so I removed it in my local repo.
>>
>> Vladimir
>>
>> On 4/20/16 12:07 PM, Civlin, Jan wrote:
>>> Thank you!
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, April 20, 2016 11:38 AM
>>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>> supports_sha() available)
>>>
>>> Looks good to me. I submitted testing on all platforms before integrating.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/20/16 3:11 AM, Civlin, Jan wrote:
>>>> Vladimir,
>>>>
>>>> Please look at the updated patch at 
>>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>>>>
>>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>>>>
>>>> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>>>>
>>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>>>>
>>>> Thank you,
>>>>
>>>> J
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/j
>>>> a v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 
>>>> af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA 
>>>> throughput = 356.09558280340946 MB/s
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin
>>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 
>>>> af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA 
>>>> throughput = 354.1696071938408 MB/s
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin
>>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 
>>>> af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA 
>>>> throughput = 349.01408678325697 MB/s
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Monday, April 18, 2016 5:09 PM
>>>> To: Civlin, Jan; hotspot compiler
>>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>>> supports_sha() available)
>>>>
>>>> Hi Jan,
>>>>
>>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>>>>
>>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>>>>
>>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>>>>
>>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>>>>
>>>> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>>>>
>>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>>>>> == Correction in the subject line ===
>>>>>
>>>>> We would like to contribute the SHA256 AVX2 intrinsic.
>>>>>
>>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>>>>
>>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>>>>
>>>>> Contributor: Jan Civlin.
>>>>>
>>>>>
>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>>>>

From christian.thalinger at oracle.com  Thu Apr 21 21:18:36 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 21 Apr 2016 11:18:36 -1000
Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)"
	if RangeCheckElimination is disabled
In-Reply-To: <5718DC28.7080502@oracle.com>
References: <5718DC28.7080502@oracle.com>
Message-ID: <F2D693AF-F0A4-4B1D-BBC6-465B3A0EC667@oracle.com>

Not a review but I think PostLoopMultiversioning should be a diagnostic_pd.  I?ve added a comment to this Enhancement:

https://bugs.openjdk.java.net/browse/JDK-8150900

> On Apr 21, 2016, at 3:56 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi,
> 
> please review the following patch:
> 
> https://bugs.openjdk.java.net/browse/JDK-8154763
> http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/
> 
> JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled.
> 
> I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check.
> 
> Tested with regression test and RBT (running).
> 
> Thanks,
> Tobias
> 


From christian.thalinger at oracle.com  Thu Apr 21 21:34:38 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 21 Apr 2016 11:34:38 -1000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A84884@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<57162A88.7030608@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com>
	<5717D7E8.5000108@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A84884@ORSMSX106.amr.corp.intel.com>
Message-ID: <74581C88-CC26-441E-933B-73954C56F077@oracle.com>


> On Apr 20, 2016, at 2:13 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com> wrote:
> 
> Hi
>  
> The correct URL for the updated webrev is this.
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ <http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/>
+void MacroAssembler::mathfunc(address runtime_entry) {
I don?t like the name of this method.  Mainly because it?s only aligning the stack (shouldn?t that happen somewhere else?) and doing this 0x20 stack frame thing which I still don?t understand.

Right, this is the one I was thinking about:

void MacroAssembler::call_VM_leaf_base(address entry_point, int num_args) {

> Sorry for the spam.
>  
> Regards,
> Vivek
>  
> From: Deshpande, Vivek R 
> Sent: Wednesday, April 20, 2016 5:10 PM
> To: Deshpande, Vivek R; 'Nils Eliasson'; 'hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>'
> Cc: 'Vladimir Kozlov'; 'Volker Simonis'; 'Christian Thalinger'; Viswanathan, Sandhya
> Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics
>  
> Sent out the wrong link by mistake.
>  
> updated webrev: 
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ <http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/>
>  
> Regards
> Vivek
>  
>  
> From: Deshpande, Vivek R 
> Sent: Wednesday, April 20, 2016 5:07 PM
> To: 'Nils Eliasson'; hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
> Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
> Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics
>  
> Hi Nils
>  
> I have updated the webrev with all the suggestions.
> updated webrev: 
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ <http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/>
> Thanks for your comments and review.
>  
> @Vladimir,
> I have taken care of all the comments. Would you please review and sponsor the patch.
>  
> Thanks and regards,
> Vivek 
>  
> From: Nils Eliasson [mailto:nils.eliasson at oracle.com <mailto:nils.eliasson at oracle.com>] 
> Sent: Wednesday, April 20, 2016 12:27 PM
> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
> Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
> Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics
>  
> In vmSymbols.cpp together with the other flag checks.
> 
> Regards,
> Nils
> 
> On 2016-04-20 02:44, Deshpande, Vivek R wrote:
> HI Nils
>  
> Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static.
> Could you point me the right location for the function ?
> Also I have updated the webrev with rest of the comments here:
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ <http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/>
>  
> Regards,
> Vivek
>  
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>]On Behalf Of Nils Eliasson
> Sent: Tuesday, April 19, 2016 5:55 AM
> To: hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics
>  
> Hi Vivek,
> 
> The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives.  
> 
> Regards,
> Nils
> 
> On 2016-04-18 19:38, Deshpande, Vivek R wrote:
> Hi all
>  
> I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
> This uses -XX:DisableIntrinsic option to achieve the same.
> Could you please review and sponsor this patch.
>  
> Bug-id: 
> https://bugs.openjdk.java.net/browse/JDK-8154473 <https://bugs.openjdk.java.net/browse/JDK-8154473>
> webrev:
> http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ <http://cr.openjdk.java.net/%7Evdeshpande/CompilerDirectives/8154473/webrev.00/>
>  
> Thanks and regards,
> Vivek
>  
>  
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160421/d40ccb7a/attachment-0001.html>

From vladimir.kozlov at oracle.com  Thu Apr 21 22:08:43 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Apr 2016 15:08:43 -0700
Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha()
	available)
In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C8DB@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com>
	<57157726.4030701@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com>
	<5717CC83.3070401@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com>
	<5717E0C4.8050006@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com>
	<571807F1.10105@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com>
	<57192A60.5030206@oracle.com>
	<39F83597C33E5F408096702907E6C4500F16C8DB@ORSMSX104.amr.corp.intel.com>
Message-ID: <57194F6B.4050008@oracle.com>

I think there is assumption in some tests that x86 does not support sha2.

Next tests failed:

compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java

java.lang.AssertionError: Expected message not found: 'Intrinsics for SHA-224 and SHA-256 crypto hash functions not available on this CPU.'.
JVM should start with '-XX:+UseSHA256Intrinsics' flag, but output should contain warning.

compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java

java.lang.AssertionError: Expected message not found: 'SHA instructions are not available on this CPU'.
JVM should start with '-XX:+UseSHA' flag, but output should contain warning.

compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java

java.lang.RuntimeException: Unexpected count of intrinsic  _sha2_implCompress is expected:false, matched: 2, suspected: 5

compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java

java.lang.RuntimeException: Unexpected count of intrinsic  _digestBase_implCompressMB is expected:false, matched: 1, suspected: 6
	
Regards,
Vladimir

On 4/21/16 12:58 PM, Civlin, Jan wrote:
> I know how to run jtreg.
> I tested it on TestSHA.jar, admittedly it was rather a standalone run.
>
> $ /cygdrive/E/Java/sha-041116/build/windows-x86_64-normal-server-fastdebug/jdk/bin/java.exe -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:UseSSE=2 -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
> provider = SUN
> algorithm = SHA-256
> msgSize = 1024 bytes
> offset = 0
> iters = 10000000
> warmupIters = 20000
> hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
> TestSHA runtime = 33.507065719 seconds
> TestSHA throughput = 305.60718404516876 MB/s
>
>
> jcivlin at JCIVLIN-DESK /cygdrive/C/Java/hotspot/test/civlin/TestSHA/TestSHA/dist
> $
>
> I'll take a look what is wrong with jtreg.
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 21, 2016 12:31 PM
> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)
>
> Good. But testing found that jrteg SHA tests failed because they don't expect the SHA2 is supported:
>
> "Expected message not found: 'SHA instructions are not available on this CPU'"
>
> hotspot/test/compiler/intrinsics/sha/
>
> Do you know how to run jtreg tests? May be Sandhya or Michael can help.
>
> Thanks,
> Vladimir
>
> On 4/21/16 11:15 AM, Civlin, Jan wrote:
>> Vladimir,
>>
>> I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function.
>> Please look at
>>
>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/
>>
>> Thank you,
>>
>> J
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, April 20, 2016 3:51 PM
>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>> supports_sha() available)
>>
>> Testing is continued but it found next problem already when running tests with -XX:UseSSE=2:
>>
>> #  Internal Error
>> (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.
>> cpp:3693),
>> pid=52652, tid=3587
>> #  Error: assert(VM_Version::supports_ssse3()) failed
>>
>> V  [libjvm.dylib+0x4193d7]  report_vm_error(char const*, int, char
>> const*, char const*, ...)+0xcd V  [libjvm.dylib+0x1eedd2]
>> Assembler::vpshufb(XMMRegisterImpl*,
>> XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V
>> [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*,
>> XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*,
>> XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*,
>> XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*,
>> RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V
>> [libjvm.dylib+0xa4dc47]
>> StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b
>>
>>
>> Vladimir
>>
>> On 4/20/16 1:13 PM, Civlin, Jan wrote:
>>> Thank you, Vladimir.
>>>
>>> I guess it was a warning.
>>> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new.
>>>
>>>
>>> Section 6.7.2.2 of C99 lists the syntax as:
>>>
>>> enum-specifier:
>>>        enum identifieropt { enumerator-list }
>>>        enum identifieropt { enumerator-list , }
>>>        enum identifier
>>> enumerator-list:
>>>        enumerator
>>>        enumerator-list , enumerator
>>> enumerator:
>>>        enumeration-constant
>>>        enumeration-constant = constant-expression
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, April 20, 2016 1:04 PM
>>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>> supports_sha() available)
>>>
>>> One thing was caught during build is ',' at the last line of enum:
>>>
>>> +  STACK_SIZE = _RSP      + _RSP_SIZE,
>>> +};
>>>
>>> Compiler complains about it so I removed it in my local repo.
>>>
>>> Vladimir
>>>
>>> On 4/20/16 12:07 PM, Civlin, Jan wrote:
>>>> Thank you!
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Wednesday, April 20, 2016 11:38 AM
>>>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler
>>>> <hotspot-compiler-dev at openjdk.java.net>
>>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>>> supports_sha() available)
>>>>
>>>> Looks good to me. I submitted testing on all platforms before integrating.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/20/16 3:11 AM, Civlin, Jan wrote:
>>>>> Vladimir,
>>>>>
>>>>> Please look at the updated patch at
>>>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>>>>>
>>>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>>>>>
>>>>> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>>>>>
>>>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> J
>>>>>
>>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/j
>>>>> a v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58
>>>>> af
>>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA
>>>>> throughput = 356.09558280340946 MB/s
>>>>>
>>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin
>>>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58
>>>>> af
>>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA
>>>>> throughput = 354.1696071938408 MB/s
>>>>>
>>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin
>>>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics
>>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes
>>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58
>>>>> af
>>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA
>>>>> throughput = 349.01408678325697 MB/s
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Monday, April 18, 2016 5:09 PM
>>>>> To: Civlin, Jan; hotspot compiler
>>>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>>>> supports_sha() available)
>>>>>
>>>>> Hi Jan,
>>>>>
>>>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>>>>>
>>>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>>>>>
>>>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>>>>>
>>>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>>>>>
>>>>> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>>>>>
>>>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>>>>>> == Correction in the subject line ===
>>>>>>
>>>>>> We would like to contribute the SHA256 AVX2 intrinsic.
>>>>>>
>>>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>>>>>
>>>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>>>>>
>>>>>> Contributor: Jan Civlin.
>>>>>>
>>>>>>
>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>>>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>>>>>

From dean.long at oracle.com  Thu Apr 21 22:09:31 2016
From: dean.long at oracle.com (Dean Long)
Date: Thu, 21 Apr 2016 15:09:31 -0700
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <57188E03.5070303@redhat.com>
References: <57188E03.5070303@redhat.com>
Message-ID: <c3d397f7-f909-4b65-42e6-78494e58fe9c@oracle.com>

Hi Roland.  This sounds like it has a lot of overlap with JDK-6217251.  
If so, could you update JDK-6217251 explaining what more, if anything, 
needs to be done?

thanks,

dl

On 4/21/2016 1:23 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8154826/webrev.00/
>
> The aarch64 port implicitly transforms:
> (AddP base (AddP base address (LShiftL index con)) offset)
> into:
> (AddP base (AddP base offset) (LShiftL index con))
> in the ad file to embed the shift (and possibly and i2l conversion) into
> the addressing mode of a memory operation. Exposing that transformation
> in the ideal graph allows:
>
> - (AddP base offset) to be scheduled (for instance outside a loop)
> - multiple identical (AddP base offset) to be commoned
> - (LShiftL index con) to be cloned during matching so that each memory
> access has its own
>
> Roland.


From vladimir.kozlov at oracle.com  Thu Apr 21 22:18:47 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Apr 2016 15:18:47 -0700
Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)"
	if RangeCheckElimination is disabled
In-Reply-To: <5718DC28.7080502@oracle.com>
References: <5718DC28.7080502@oracle.com>
Message-ID: <571951C7.3060806@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/21/16 6:56 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch:
>
> https://bugs.openjdk.java.net/browse/JDK-8154763
> http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/
>
> JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled.
>
> I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check.
>
> Tested with regression test and RBT (running).
>
> Thanks,
> Tobias
>

From vladimir.kozlov at oracle.com  Thu Apr 21 22:37:21 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Apr 2016 15:37:21 -0700
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <5718B9BC.10001@oracle.com>
References: <5718B9BC.10001@oracle.com>
Message-ID: <57195621.7050307@oracle.com>

Hi, Zoltan

I think we should change code in prefetch_allocation() instead:

       Node *cache_adr = new AddPNode(old_eden_top, old_eden_top,
                                      _igvn.MakeConX(step_size + distance));

These way we allow AllocatePrefetchDistance == 0 in all AllocatePrefetchStyle cases - it is consistent.

Thanks,
Vladimir

On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the patch for 8153340.
>
> https://bugs.openjdk.java.net/browse/JDK-8153340
>
>
> Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way the address for the first prefetch instruction is calculated [1]:
>
> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of the bits of cache_adr. That result in accesses *before* the newly allocated object.
>
>
> Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). Unquarantine test.
>
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>
> Testing:
> - JPRT (incl. TestOptionsWithRanges.java)
> - local testing on a SPARC machine.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941

From vladimir.kozlov at oracle.com  Thu Apr 21 22:39:05 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Apr 2016 15:39:05 -0700
Subject: [9] RFR(XS): 8086057: Crash with "modified node is not on
	IGVN._worklist" when running with -XX:-SplitIfBlocks
In-Reply-To: <5718A737.1050708@oracle.com>
References: <5718A737.1050708@oracle.com>
Message-ID: <57195689.1050508@oracle.com>

Good.

Thanks,
Vladimir

On 4/21/16 3:11 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch:
>
> https://bugs.openjdk.java.net/browse/JDK-8086057
> http://cr.openjdk.java.net/~thartmann/8086057/webrev.00/
>
> The fastdebug VM crashes with "modified node is not on IGVN._worklist" because SuperWord::align_initial_loop_index() resets the input of the pre-loop Opaque1 node 'pre_opaq' without putting it on the IGVN worklist afterwards. This only shows up with -XX:-SplitIfBlocks because otherwise surrounding nodes are changed causing the OpagueNode to be put on the worklist. The method should use PhaseIterGVN::replace_input_of() which uses rehash_node_delayed() to ensure the modified node is put on the worklist.
>
> Tested with regression test, Nashorn+Octane with -XX:-SplitIfBlocks and RBT (running).
>
> Thanks,
> Tobias
>

From tom.rodriguez at oracle.com  Fri Apr 22 01:22:35 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 21 Apr 2016 18:22:35 -0700
Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly
	handle private methods in interfaces
Message-ID: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com>

http://cr.openjdk.java.net/~never/8152903/webrev 

JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. 

Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes.  A modified version of the test which found the issue also passes now.  I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes.  https://bugs.openjdk.java.net/browse/JDK-8154904 <https://bugs.openjdk.java.net/browse/JDK-8154904>

tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160421/57ed086e/attachment-0001.html>

From igor.veresov at oracle.com  Fri Apr 22 03:23:32 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 21 Apr 2016 20:23:32 -0700
Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly
	handle private methods in interfaces
In-Reply-To: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com>
References: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com>
Message-ID: <80797340-B145-42F2-ABED-902197C9AC17@oracle.com>

Looks good!

igor

> On Apr 21, 2016, at 6:22 PM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~never/8152903/webrev <http://cr.openjdk.java.net/~never/8152903/webrev> 
> 
> JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. 
> 
> Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes.  A modified version of the test which found the issue also passes now.  I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes.  https://bugs.openjdk.java.net/browse/JDK-8154904 <https://bugs.openjdk.java.net/browse/JDK-8154904>
> 
> tom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160421/7fc46d23/attachment.html>

From tobias.hartmann at oracle.com  Fri Apr 22 05:19:06 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 22 Apr 2016 07:19:06 +0200
Subject: [9] RFR(XS): 8086057: Crash with "modified node is not on
	IGVN._worklist" when running with -XX:-SplitIfBlocks
In-Reply-To: <57195689.1050508@oracle.com>
References: <5718A737.1050708@oracle.com> <57195689.1050508@oracle.com>
Message-ID: <5719B44A.8080200@oracle.com>

Thanks, Vladimir!

Best regards,
Tobias

On 22.04.2016 00:39, Vladimir Kozlov wrote:
> Good.
> 
> Thanks,
> Vladimir
> 
> On 4/21/16 3:11 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8086057
>> http://cr.openjdk.java.net/~thartmann/8086057/webrev.00/
>>
>> The fastdebug VM crashes with "modified node is not on IGVN._worklist" because SuperWord::align_initial_loop_index() resets the input of the pre-loop Opaque1 node 'pre_opaq' without putting it on the IGVN worklist afterwards. This only shows up with -XX:-SplitIfBlocks because otherwise surrounding nodes are changed causing the OpagueNode to be put on the worklist. The method should use PhaseIterGVN::replace_input_of() which uses rehash_node_delayed() to ensure the modified node is put on the worklist.
>>
>> Tested with regression test, Nashorn+Octane with -XX:-SplitIfBlocks and RBT (running).
>>
>> Thanks,
>> Tobias
>>

From tobias.hartmann at oracle.com  Fri Apr 22 05:20:01 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 22 Apr 2016 07:20:01 +0200
Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)"
	if RangeCheckElimination is disabled
In-Reply-To: <F2D693AF-F0A4-4B1D-BBC6-465B3A0EC667@oracle.com>
References: <5718DC28.7080502@oracle.com>
	<F2D693AF-F0A4-4B1D-BBC6-465B3A0EC667@oracle.com>
Message-ID: <5719B481.2090100@oracle.com>

Hi Chris,

On 21.04.2016 23:18, Christian Thalinger wrote:
> Not a review but I think PostLoopMultiversioning should be a diagnostic_pd.  I?ve added a comment to this Enhancement:
> 
> https://bugs.openjdk.java.net/browse/JDK-8150900

Yes, I agree. We should avoid adding too many new product flags.

Thanks,
Tobias

> 
>> On Apr 21, 2016, at 3:56 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
>>
>> Hi,
>>
>> please review the following patch:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8154763
>> http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/
>>
>> JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled.
>>
>> I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check.
>>
>> Tested with regression test and RBT (running).
>>
>> Thanks,
>> Tobias
>>
> 

From tobias.hartmann at oracle.com  Fri Apr 22 05:20:21 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 22 Apr 2016 07:20:21 +0200
Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)"
	if RangeCheckElimination is disabled
In-Reply-To: <571951C7.3060806@oracle.com>
References: <5718DC28.7080502@oracle.com> <571951C7.3060806@oracle.com>
Message-ID: <5719B495.9010103@oracle.com>

Thanks, Vladimir!

Best regards,
Tobias

On 22.04.2016 00:18, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 4/21/16 6:56 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8154763
>> http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/
>>
>> JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled.
>>
>> I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check.
>>
>> Tested with regression test and RBT (running).
>>
>> Thanks,
>> Tobias
>>

From rwestrel at redhat.com  Fri Apr 22 12:38:32 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 22 Apr 2016 14:38:32 +0200
Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64
Message-ID: <571A1B48.8040304@redhat.com>

http://cr.openjdk.java.net/~roland/8154939/webrev.00/

8153998 added a test that assumes SuperWordLoopUnrollAnalysis is enabled
but it's not the case on aarch64. As a consequence, vectorization
doesn't trigger on aarch64 anymore.

Roland.

From nils.eliasson at oracle.com  Fri Apr 22 13:53:11 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 22 Apr 2016 15:53:11 +0200
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
In-Reply-To: <5717E190.5070107@oracle.com>
References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com>
	<5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com>
	<5717DF06.3090305@oracle.com> <5717E190.5070107@oracle.com>
Message-ID: <571A2CC7.30506@oracle.com>

Hi,

Background:
The compilelog can get corrupted and the VM may assert on "failed: bad 
tag in log" when printing opto_assembly. (Print assembly turns on print 
opto_assembly if hs_dis is not present.)

When printing opto_assembly in output.cpp we may loose the ttylock 
(break_tty_lock_for_safepoint) due to a safepoint in both print_metadata 
and dump_asm. Another thread can claim the lock and start printing. When 
the safepoint is over both threads will think they own the lock. The 
content will look ok thanks to the xml stream adding the writing thread 
tag to the log. The closing xml-tag has two problems: 1) It uses a 
raw_print and may get intermingled with other output 2) The xml tag 
stack tracking may see a bad sequence of tags.

Solution:
Retake the ttylock before printing the closing print_optoassembly tag. 
(I have only observed this safepoint issue with print_optoassembly.)

If another tag already has the lock and is printing print_nmethod for 
example, print opto_assembly will block. Here we can have two variants:

1) the other thread will print something else (like print_nmethod) - 
then that tag will be closed before releaseing the lock, and the tag 
stack will be consistent but the output may look like 
<print_optoassembly>...<print_nmethod>...</print_nmethod></print_optoassembly>

2) the other thread is also printing opto_assembly. Then that thread may 
yield the look during a safepoint while the first one is retaking the 
look. Then we can get <print_optoassembly id=1><print_optoassembly 
id=2></print_optoassembly (for id 1)></print_optoassembly (for id 2)> or 
<print_optoassembly id=1><print_optoassembly id=2></print_optoassembly 
(for id 2)></print_optoassembly (for id 1)>

Fortunately the xml stack consistency will be ok since it doesn't make 
any difference on what thread wrote the print_optoassembly tag.

Pre-mortem
If this issue pops-up again we must investigate if there are more places 
in the compile log code that yields the tty lock on a safepoint. 
NoSafePointVerifiers don't seem to check on all transitions.

Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.02

Testing:
All regression tests.

Regards,
Nils Eliasson

On 2016-04-20 22:07, Vladimir Kozlov wrote:
> On 4/20/16 12:56 PM, Nils Eliasson wrote:
>> Hi,
>>
>> Thanks for the help,
>>
>> I got it to work, and added NoSafePointVerifiers to make sure I hadn't
>> missed anything. Then after many test iterations it failed again. It
>> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get
>> a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to
>> the temporary buffer too, or relax the tag checks in the xml and accept
>> that the output may need to be sorted by writer-thread before use. The
>> output looks like:
>>
>> <writer id=1/>
>> <print_optoassembly>
>> ...
>> releases tty when blocking on a safepoint
>> <writer id=2/>
>> <print_nmethod>
>> ...
>> <writer id=1/>         // back again after safepoint writing without
>> ttylock now.
>> </print_optoassembly>  // Here we fail on an assert today when we expect
>> a closing print_nmethod tag
>> <writer id=2/>
>> </print_nmethod>
>>
>> This is malformed xml but has enough information to be reconstructed.
>> Would this be an acceptable output?
>
> Yes, I think it is acceptable - we don't loose information. And it is 
> not worse than it was before.
>
> Thanks,
> Vladimir
>
>>
>> Regards,
>> Nils
>>
>>
>> On 2016-04-18 19:30, Vladimir Kozlov wrote:
>>> tty would have the same problem but it use C_HEAP to allocate:
>>>
>>>     defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal)
>>> defaultStream();
>>>
>>> Please, look if you can do something similar.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/18/16 4:24 AM, Nils Eliasson wrote:
>>>> Resizeable is better, but then we assert on expanding the stringbuffer
>>>> while being under a different ResourceMark.
>>>>
>>>> Regards,
>>>> Nils
>>>>
>>>> On 2016-04-15 19:44, Vladimir Kozlov wrote:
>>>>> Use resizable stream:
>>>>>
>>>>> stringStream(size_t initial_bufsize = 256);
>>>>>
>>>>> 1024 may not be enough.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Please review this fix of print opto_assembly.
>>>>>>
>>>>>> Summary:
>>>>>> The compilelog can get corrupted and the VM may assert on "failed:
>>>>>> bad tag in log".
>>>>>>
>>>>>> When printing assembly in output.cpp we first take the ttylock, 
>>>>>> print
>>>>>> the head and then the method metadata. However the
>>>>>> metadata printing makes a vm entry and may block for a safepoint and
>>>>>> will then release the lock
>>>>>> (break_tty_lock_for_safepoint). After that some of the other 
>>>>>> compiler
>>>>>> thread that haven't safepointed will take the lock
>>>>>> and the broken log will be a fact when the safepoint is over and the
>>>>>> first thread starts logging again.
>>>>>>
>>>>>> Solution:
>>>>>> Print the method metadata to a temporary buffer, then take the tty
>>>>>> lock.
>>>>>>
>>>>>> Testing:
>>>>>> Repro from bug stops failing.
>>>>>> Running :hotspot_all
>>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>>>>>
>>>>>> Regards,
>>>>>> Nils Eliasson
>>>>
>>


From bharadwaj.yadavalli at oracle.com  Fri Apr 22 16:56:24 2016
From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli)
Date: Fri, 22 Apr 2016 12:56:24 -0400
Subject: Missing aarch64 changes in hs due to mismerge from hs-comp to hs
Message-ID: <571A57B8.3080002@oracle.com>

As part of the hs-comp to hs push I did yesterday, it appears that I 
have mismerged the following  conflicting pushes to
src/cpu/aarch64/vm/macroAssembler_aarch64.hpp done on hs tree and 
hs-comp tree.

Fix for 8153310 
(http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/a07a10329f31) pushed to hs
Fix for 8153713 
(http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f9545cf437eb) 
pushed to hs-comp
Fix for 8153797 
(http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/d4636cc092db) 
pushed to hs-comp

I picked the version in hs repo and overrode the changes from hs-comp.

I think the next push up to hs should consider merging both the changes 
in hs as well as in hs-comp.

Thanks,

Bharadwaj

From aph at redhat.com  Fri Apr 22 17:02:10 2016
From: aph at redhat.com (Andrew Haley)
Date: Fri, 22 Apr 2016 18:02:10 +0100
Subject: Missing aarch64 changes in hs due to mismerge from hs-comp to hs
In-Reply-To: <571A57B8.3080002@oracle.com>
References: <571A57B8.3080002@oracle.com>
Message-ID: <571A5912.4060101@redhat.com>

On 04/22/2016 05:56 PM, S. Bharadwaj Yadavalli wrote:
> As part of the hs-comp to hs push I did yesterday, it appears that I 
> have mismerged the following  conflicting pushes to
> src/cpu/aarch64/vm/macroAssembler_aarch64.hpp done on hs tree and 
> hs-comp tree.
> 
> Fix for 8153310 
> (http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/a07a10329f31) pushed to hs
> Fix for 8153713 
> (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f9545cf437eb) 
> pushed to hs-comp
> Fix for 8153797 
> (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/d4636cc092db) 
> pushed to hs-comp
> 
> I picked the version in hs repo and overrode the changes from hs-comp.
> 
> I think the next push up to hs should consider merging both the changes 
> in hs as well as in hs-comp.

OK.  I think that the bustage is relatively minor.  The AArch64 port
is broken anyway at the moment because of one of the security patches.
I'll fix it.

Andrew.


From bharadwaj.yadavalli at oracle.com  Fri Apr 22 17:04:38 2016
From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli)
Date: Fri, 22 Apr 2016 13:04:38 -0400
Subject: Missing aarch64 changes in hs due to mismerge from hs-comp to hs
In-Reply-To: <571A5912.4060101@redhat.com>
References: <571A57B8.3080002@oracle.com> <571A5912.4060101@redhat.com>
Message-ID: <571A59A6.9060902@oracle.com>

Thanks, Andrew!

Bharadwaj

On 04/22/2016 01:02 PM, Andrew Haley wrote:
> On 04/22/2016 05:56 PM, S. Bharadwaj Yadavalli wrote:
>> As part of the hs-comp to hs push I did yesterday, it appears that I
>> have mismerged the following  conflicting pushes to
>> src/cpu/aarch64/vm/macroAssembler_aarch64.hpp done on hs tree and
>> hs-comp tree.
>>
>> Fix for 8153310
>> (http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/a07a10329f31) pushed to hs
>> Fix for 8153713
>> (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f9545cf437eb)
>> pushed to hs-comp
>> Fix for 8153797
>> (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/d4636cc092db)
>> pushed to hs-comp
>>
>> I picked the version in hs repo and overrode the changes from hs-comp.
>>
>> I think the next push up to hs should consider merging both the changes
>> in hs as well as in hs-comp.
> OK.  I think that the bustage is relatively minor.  The AArch64 port
> is broken anyway at the moment because of one of the security patches.
> I'll fix it.
>
> Andrew.
>
>


From tom.rodriguez at oracle.com  Fri Apr 22 17:12:46 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 22 Apr 2016 10:12:46 -0700
Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly
	handle private methods in interfaces
In-Reply-To: <80797340-B145-42F2-ABED-902197C9AC17@oracle.com>
References: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com>
	<80797340-B145-42F2-ABED-902197C9AC17@oracle.com>
Message-ID: <92ABC0EE-5375-42A3-ADEA-E0BDAE208A9B@oracle.com>

Thanks!

tom

> On Apr 21, 2016, at 8:23 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> Looks good!
> 
> igor
> 
>> On Apr 21, 2016, at 6:22 PM, Tom Rodriguez <tom.rodriguez at oracle.com <mailto:tom.rodriguez at oracle.com>> wrote:
>> 
>> http://cr.openjdk.java.net/~never/8152903/webrev <http://cr.openjdk.java.net/~never/8152903/webrev> 
>> 
>> JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. 
>> 
>> Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes.  A modified version of the test which found the issue also passes now.  I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes.  https://bugs.openjdk.java.net/browse/JDK-8154904 <https://bugs.openjdk.java.net/browse/JDK-8154904>
>> 
>> tom
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160422/1d562157/attachment.html>

From michael.c.berg at intel.com  Fri Apr 22 17:44:05 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Fri, 22 Apr 2016 17:44:05 +0000
Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64
In-Reply-To: <571A1B48.8040304@redhat.com>
References: <571A1B48.8040304@redhat.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E643B1@FMSMSX102.amr.corp.intel.com>

The changes seems ok since unroll analysis is not enabled for aarch64.

-Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Roland Westrelin
Sent: Friday, April 22, 2016 5:39 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64

http://cr.openjdk.java.net/~roland/8154939/webrev.00/

8153998 added a test that assumes SuperWordLoopUnrollAnalysis is enabled but it's not the case on aarch64. As a consequence, vectorization doesn't trigger on aarch64 anymore.

Roland.

From vladimir.kozlov at oracle.com  Fri Apr 22 18:04:28 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Apr 2016 11:04:28 -0700
Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64
In-Reply-To: <571A1B48.8040304@redhat.com>
References: <571A1B48.8040304@redhat.com>
Message-ID: <ae3bcef8-601e-16f9-5d6e-2669b1d366ce@oracle.com>

Looks good. I will sponsor it after testing.

Thanks,
Vladimir

On 4/22/16 5:38 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8154939/webrev.00/
>
> 8153998 added a test that assumes SuperWordLoopUnrollAnalysis is enabled
> but it's not the case on aarch64. As a consequence, vectorization
> doesn't trigger on aarch64 anymore.
>
> Roland.
>

From vladimir.kozlov at oracle.com  Sat Apr 23 01:00:28 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Apr 2016 18:00:28 -0700
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
In-Reply-To: <571A2CC7.30506@oracle.com>
References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com>
	<5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com>
	<5717DF06.3090305@oracle.com> <5717E190.5070107@oracle.com>
	<571A2CC7.30506@oracle.com>
Message-ID: <571AC92C.30906@oracle.com>

Yes, this fix is good enough.

Usually when engineers are using print assembly they know about this problem and use CICompilerCount=1 to have only one compiler thread to print output.

Thanks,
Vladimir

On 4/22/16 6:53 AM, Nils Eliasson wrote:
> Hi,
>
> Background:
> The compilelog can get corrupted and the VM may assert on "failed: bad tag in log" when printing opto_assembly. (Print assembly turns on print opto_assembly if hs_dis is not present.)
>
> When printing opto_assembly in output.cpp we may loose the ttylock (break_tty_lock_for_safepoint) due to a safepoint in both print_metadata and dump_asm. Another thread can claim the lock and start
> printing. When the safepoint is over both threads will think they own the lock. The content will look ok thanks to the xml stream adding the writing thread tag to the log. The closing xml-tag has two
> problems: 1) It uses a raw_print and may get intermingled with other output 2) The xml tag stack tracking may see a bad sequence of tags.
>
> Solution:
> Retake the ttylock before printing the closing print_optoassembly tag. (I have only observed this safepoint issue with print_optoassembly.)
>
> If another tag already has the lock and is printing print_nmethod for example, print opto_assembly will block. Here we can have two variants:
>
> 1) the other thread will print something else (like print_nmethod) - then that tag will be closed before releaseing the lock, and the tag stack will be consistent but the output may look like
> <print_optoassembly>...<print_nmethod>...</print_nmethod></print_optoassembly>
>
> 2) the other thread is also printing opto_assembly. Then that thread may yield the look during a safepoint while the first one is retaking the look. Then we can get <print_optoassembly
> id=1><print_optoassembly id=2></print_optoassembly (for id 1)></print_optoassembly (for id 2)> or <print_optoassembly id=1><print_optoassembly id=2></print_optoassembly (for id 2)></print_optoassembly
> (for id 1)>
>
> Fortunately the xml stack consistency will be ok since it doesn't make any difference on what thread wrote the print_optoassembly tag.
>
> Pre-mortem
> If this issue pops-up again we must investigate if there are more places in the compile log code that yields the tty lock on a safepoint. NoSafePointVerifiers don't seem to check on all transitions.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.02
>
> Testing:
> All regression tests.
>
> Regards,
> Nils Eliasson
>
> On 2016-04-20 22:07, Vladimir Kozlov wrote:
>> On 4/20/16 12:56 PM, Nils Eliasson wrote:
>>> Hi,
>>>
>>> Thanks for the help,
>>>
>>> I got it to work, and added NoSafePointVerifiers to make sure I hadn't
>>> missed anything. Then after many test iterations it failed again. It
>>> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get
>>> a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to
>>> the temporary buffer too, or relax the tag checks in the xml and accept
>>> that the output may need to be sorted by writer-thread before use. The
>>> output looks like:
>>>
>>> <writer id=1/>
>>> <print_optoassembly>
>>> ...
>>> releases tty when blocking on a safepoint
>>> <writer id=2/>
>>> <print_nmethod>
>>> ...
>>> <writer id=1/>         // back again after safepoint writing without
>>> ttylock now.
>>> </print_optoassembly>  // Here we fail on an assert today when we expect
>>> a closing print_nmethod tag
>>> <writer id=2/>
>>> </print_nmethod>
>>>
>>> This is malformed xml but has enough information to be reconstructed.
>>> Would this be an acceptable output?
>>
>> Yes, I think it is acceptable - we don't loose information. And it is not worse than it was before.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Regards,
>>> Nils
>>>
>>>
>>> On 2016-04-18 19:30, Vladimir Kozlov wrote:
>>>> tty would have the same problem but it use C_HEAP to allocate:
>>>>
>>>>     defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal)
>>>> defaultStream();
>>>>
>>>> Please, look if you can do something similar.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/18/16 4:24 AM, Nils Eliasson wrote:
>>>>> Resizeable is better, but then we assert on expanding the stringbuffer
>>>>> while being under a different ResourceMark.
>>>>>
>>>>> Regards,
>>>>> Nils
>>>>>
>>>>> On 2016-04-15 19:44, Vladimir Kozlov wrote:
>>>>>> Use resizable stream:
>>>>>>
>>>>>> stringStream(size_t initial_bufsize = 256);
>>>>>>
>>>>>> 1024 may not be enough.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please review this fix of print opto_assembly.
>>>>>>>
>>>>>>> Summary:
>>>>>>> The compilelog can get corrupted and the VM may assert on "failed:
>>>>>>> bad tag in log".
>>>>>>>
>>>>>>> When printing assembly in output.cpp we first take the ttylock, print
>>>>>>> the head and then the method metadata. However the
>>>>>>> metadata printing makes a vm entry and may block for a safepoint and
>>>>>>> will then release the lock
>>>>>>> (break_tty_lock_for_safepoint). After that some of the other compiler
>>>>>>> thread that haven't safepointed will take the lock
>>>>>>> and the broken log will be a fact when the safepoint is over and the
>>>>>>> first thread starts logging again.
>>>>>>>
>>>>>>> Solution:
>>>>>>> Print the method metadata to a temporary buffer, then take the tty
>>>>>>> lock.
>>>>>>>
>>>>>>> Testing:
>>>>>>> Repro from bug stops failing.
>>>>>>> Running :hotspot_all
>>>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nils Eliasson
>>>>>
>>>
>

From vivek.r.deshpande at intel.com  Sat Apr 23 01:10:08 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Sat, 23 Apr 2016 01:10:08 +0000
Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com>

Hi all

I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154975
webrev:
http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/

Thanks and regards,
Vivek


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160423/701e1997/attachment.html>

From vladimir.kozlov at oracle.com  Sat Apr 23 01:54:04 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Apr 2016 18:54:04 -0700
Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com>
Message-ID: <571AD5BC.10309@oracle.com>

Hi Vivek,

How it is related to next macro instructions from JDK-8153998?

+  // special instructions for EVEX
+  void setvectmask(Register dst, Register src);
+  void restorevectmask();

Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code.

_programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking


I don't like next code in assembler instructions:
+  if (zeroing) attributes.set_is_clear_context();

+  if (!no_reg_mask) {
+    attributes.set_embedded_opmask_register_specifier(mask);
+    if (zeroing) attributes.set_is_clear_context();
+  }

zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them.
_embedded_opmask_register_specifier is not used (only set). Don't add values which are not used.

Thanks,
Vladimir

On 4/22/16 6:10 PM, Deshpande, Vivek R wrote:
> Hi all
>
> I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic.
>
> Could you please review and sponsor this patch.
>
> Bug-id:
>
> https://bugs.openjdk.java.net/browse/JDK-8154975
>
> webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/
>
> Thanks and regards,
>
> Vivek
>

From michael.c.berg at intel.com  Sat Apr 23 02:30:00 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Sat, 23 Apr 2016 02:30:00 +0000
Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
In-Reply-To: <571AD5BC.10309@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com>
	<571AD5BC.10309@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E644EF@FMSMSX102.amr.corp.intel.com>

Vladimir, it Is not related to 8153998.
It seems his patch is not sync'd against the head of the tree.

-Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Friday, April 22, 2016 6:54 PM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Berg, Michael C <michael.c.berg at intel.com>
Subject: Re: RFR (S): 8154975: Update for vectorizedMismatch with AVX512

Hi Vivek,

How it is related to next macro instructions from JDK-8153998?

+  // special instructions for EVEX
+  void setvectmask(Register dst, Register src);  void 
+ restorevectmask();

Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code.

_programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking


I don't like next code in assembler instructions:
+  if (zeroing) attributes.set_is_clear_context();

+  if (!no_reg_mask) {
+    attributes.set_embedded_opmask_register_specifier(mask);
+    if (zeroing) attributes.set_is_clear_context();
+  }

zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them.
_embedded_opmask_register_specifier is not used (only set). Don't add values which are not used.

Thanks,
Vladimir

On 4/22/16 6:10 PM, Deshpande, Vivek R wrote:
> Hi all
>
> I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic.
>
> Could you please review and sponsor this patch.
>
> Bug-id:
>
> https://bugs.openjdk.java.net/browse/JDK-8154975
>
> webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/
>
> Thanks and regards,
>
> Vivek
>

From michael.c.berg at intel.com  Sun Apr 24 02:14:14 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Sun, 24 Apr 2016 02:14:14 +0000
Subject: CR for RFR 8154896
Message-ID: <C568518E7B433348B114B6A7122D474756E6469B@FMSMSX102.amr.corp.intel.com>

Hi Folks,

I would like to contribute a bug fix for SKX/EVEX code gen.  There is a guarantee of isBit(imm8) for jccb which can sometimes fail when upper bank register marshaling is required for instructions without EVEX or conditionally EVEX support on SKX.  This patch address the minimal set of changes which can have this issue.

This code was tested as follows (see jbs entry below):

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896

webrev:
http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/

Thanks,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160424/fddadbe5/attachment.html>

From michael.c.berg at intel.com  Sun Apr 24 02:23:57 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Sun, 24 Apr 2016 02:23:57 +0000
Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
In-Reply-To: <C568518E7B433348B114B6A7122D474756E644EF@FMSMSX102.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com>
	<571AD5BC.10309@oracle.com>
	<C568518E7B433348B114B6A7122D474756E644EF@FMSMSX102.amr.corp.intel.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E646B3@FMSMSX102.amr.corp.intel.com>

More info:

Setvectmask and restorevectmask are used for auto code generation.

The method attached here is used entirely for stub code assembly and is intended only for stub code usage and checked via assertions as such.

Regards,
Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
Sent: Friday, April 22, 2016 7:30 PM
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512

Vladimir, it Is not related to 8153998.
It seems his patch is not sync'd against the head of the tree.

-Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Friday, April 22, 2016 6:54 PM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Berg, Michael C <michael.c.berg at intel.com>
Subject: Re: RFR (S): 8154975: Update for vectorizedMismatch with AVX512

Hi Vivek,

How it is related to next macro instructions from JDK-8153998?

+  // special instructions for EVEX
+  void setvectmask(Register dst, Register src);  void 
+ restorevectmask();

Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code.

_programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking


I don't like next code in assembler instructions:
+  if (zeroing) attributes.set_is_clear_context();

+  if (!no_reg_mask) {
+    attributes.set_embedded_opmask_register_specifier(mask);
+    if (zeroing) attributes.set_is_clear_context();
+  }

zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them.
_embedded_opmask_register_specifier is not used (only set). Don't add values which are not used.

Thanks,
Vladimir

On 4/22/16 6:10 PM, Deshpande, Vivek R wrote:
> Hi all
>
> I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic.
>
> Could you please review and sponsor this patch.
>
> Bug-id:
>
> https://bugs.openjdk.java.net/browse/JDK-8154975
>
> webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/
>
> Thanks and regards,
>
> Vivek
>

From tobias.hartmann at oracle.com  Mon Apr 25 06:38:03 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 25 Apr 2016 08:38:03 +0200
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <571784C7.6020304@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com>
Message-ID: <571DBB4B.2070801@oracle.com>

Hi,

I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).

If there are no objections, I would like to push the basic version:
http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/

If unaligned performance turns out to be a problem, we can still improve the intrinsic.

Thanks,
Tobias

On 20.04.2016 15:31, Tobias Hartmann wrote:
> Hi John,
> 
> On 20.04.2016 03:46, John Rose wrote:
>> So I started looking at your code and my inner SPARC junkie took over.
>>
>> This is what happened:
>>   http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
> 
> Thanks a lot for having a look!
> 
>> Perhaps there are some ideas that might be helpful:
>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
> 
> Right, this simplifies the code a bit:
> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
> 
> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
> 
>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
> 
> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
> 
> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
> 
>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
> 
> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
> 
> What do you think?
> 
> Thanks,
> Tobias
> 
> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
> [3] Runtime alignment checks:
> 
> bind(Lunaligned);
> Label next;
> xor3(ary1, ary2, tmp);
> and3(tmp, 7, tmp);
> br_null_short(tmp, Assembler::pn, next);
> STOP("One array is unaligned!");
> should_not_reach_here();
> bind(next);
> STOP("Both arrays are unaligned!");
>  
>> On the other hand, what you wrote is nice and simple.
>>
>> HTH
>> ? John
>>
>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>> But that's neither nice nor simple.
>>
>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>
>>> Thanks, Vladimir!
>>>
>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>
>>> Okay, I'll push the basic version.
>>>
>>> For reference, here are the results on a SPARC T4:
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>
>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>> We do have arraycopy code for it but by default we don't use it:
>>>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>
>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>
>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>
>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>> ArraycopySrcPrefetchDistance (42) must be 0
>>> Error: Could not create the Java Virtual Machine.
>>> Error: A fatal exception has occurred. Program will exit
>>>
>>> Thanks,
>>> Tobias
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>> Hi,
>>>>>
>>>>> please review the following enhancement:
>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>
>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>
>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>
>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>
>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>
>>>>> I evaluated the following three versions of the patch.
>>>>>
>>>>> -- Basic --
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>
>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>
>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>> Version "small" tries to improve this.
>>>>>
>>>>> -- Prefetching --
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>
>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>
>>>>> -- Small --
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>
>>>>> The numbers can be found here:
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>
>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>
>>>>> Thanks,
>>>>> Tobias
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>>>>

From mikael.gerdin at oracle.com  Mon Apr 25 07:59:17 2016
From: mikael.gerdin at oracle.com (Mikael Gerdin)
Date: Mon, 25 Apr 2016 09:59:17 +0200
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <571DBB4B.2070801@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com>
Message-ID: <571DCE55.6070506@oracle.com>

Hi Tobias

On 2016-04-25 08:38, Tobias Hartmann wrote:
> Hi,
>
> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).

I think that if you disable compressed klass pointers (or compressed 
oops) then I suspect that array contents are not always 8 byte aligned.

With compressed oops enabled the mark word + compressed class + alength
add up to 16 bytes and thus the array contents are guarateed to be 8 
byte aligned.
If compressed klass pointers are enabled then the array header will 
become 20 bytes and I don't know how the contents are laid out in that case.

/Mikael

>
> If there are no objections, I would like to push the basic version:
> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>
> If unaligned performance turns out to be a problem, we can still improve the intrinsic.
>
> Thanks,
> Tobias
>
> On 20.04.2016 15:31, Tobias Hartmann wrote:
>> Hi John,
>>
>> On 20.04.2016 03:46, John Rose wrote:
>>> So I started looking at your code and my inner SPARC junkie took over.
>>>
>>> This is what happened:
>>>    http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
>>
>> Thanks a lot for having a look!
>>
>>> Perhaps there are some ideas that might be helpful:
>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
>>
>> Right, this simplifies the code a bit:
>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
>>
>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
>>
>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
>>
>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
>>
>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
>>
>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
>>
>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
>>
>> What do you think?
>>
>> Thanks,
>> Tobias
>>
>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>> [3] Runtime alignment checks:
>>
>> bind(Lunaligned);
>> Label next;
>> xor3(ary1, ary2, tmp);
>> and3(tmp, 7, tmp);
>> br_null_short(tmp, Assembler::pn, next);
>> STOP("One array is unaligned!");
>> should_not_reach_here();
>> bind(next);
>> STOP("Both arrays are unaligned!");
>>
>>> On the other hand, what you wrote is nice and simple.
>>>
>>> HTH
>>> ? John
>>>
>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>>> But that's neither nice nor simple.
>>>
>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>
>>>> Thanks, Vladimir!
>>>>
>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>>
>>>> Okay, I'll push the basic version.
>>>>
>>>> For reference, here are the results on a SPARC T4:
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>>
>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>>> We do have arraycopy code for it but by default we don't use it:
>>>>>   product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>>   product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>>
>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>>
>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>>
>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>>> ArraycopySrcPrefetchDistance (42) must be 0
>>>> Error: Could not create the Java Virtual Machine.
>>>> Error: A fatal exception has occurred. Program will exit
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>>> Hi,
>>>>>>
>>>>>> please review the following enhancement:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>>
>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>>
>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>>
>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>>
>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>>
>>>>>> I evaluated the following three versions of the patch.
>>>>>>
>>>>>> -- Basic --
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>
>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>>
>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>> Version "small" tries to improve this.
>>>>>>
>>>>>> -- Prefetching --
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>
>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>>
>>>>>> -- Small --
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>>
>>>>>> The numbers can be found here:
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>>
>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>>
>>>>>> Thanks,
>>>>>> Tobias
>>>>>>
>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>>>>>

From nils.eliasson at oracle.com  Mon Apr 25 08:33:20 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 25 Apr 2016 10:33:20 +0200
Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes
	"assert(false) failed: bad tag in log" and broken compile log
In-Reply-To: <571AC92C.30906@oracle.com>
References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com>
	<5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com>
	<5717DF06.3090305@oracle.com> <5717E190.5070107@oracle.com>
	<571A2CC7.30506@oracle.com> <571AC92C.30906@oracle.com>
Message-ID: <571DD650.7070400@oracle.com>

Thank you Vladimir!

//Nils

On 2016-04-23 03:00, Vladimir Kozlov wrote:
> Yes, this fix is good enough.
>
> Usually when engineers are using print assembly they know about this 
> problem and use CICompilerCount=1 to have only one compiler thread to 
> print output.
>
> Thanks,
> Vladimir
>
> On 4/22/16 6:53 AM, Nils Eliasson wrote:
>> Hi,
>>
>> Background:
>> The compilelog can get corrupted and the VM may assert on "failed: 
>> bad tag in log" when printing opto_assembly. (Print assembly turns on 
>> print opto_assembly if hs_dis is not present.)
>>
>> When printing opto_assembly in output.cpp we may loose the ttylock 
>> (break_tty_lock_for_safepoint) due to a safepoint in both 
>> print_metadata and dump_asm. Another thread can claim the lock and start
>> printing. When the safepoint is over both threads will think they own 
>> the lock. The content will look ok thanks to the xml stream adding 
>> the writing thread tag to the log. The closing xml-tag has two
>> problems: 1) It uses a raw_print and may get intermingled with other 
>> output 2) The xml tag stack tracking may see a bad sequence of tags.
>>
>> Solution:
>> Retake the ttylock before printing the closing print_optoassembly 
>> tag. (I have only observed this safepoint issue with 
>> print_optoassembly.)
>>
>> If another tag already has the lock and is printing print_nmethod for 
>> example, print opto_assembly will block. Here we can have two variants:
>>
>> 1) the other thread will print something else (like print_nmethod) - 
>> then that tag will be closed before releaseing the lock, and the tag 
>> stack will be consistent but the output may look like
>> <print_optoassembly>...<print_nmethod>...</print_nmethod></print_optoassembly> 
>>
>>
>> 2) the other thread is also printing opto_assembly. Then that thread 
>> may yield the look during a safepoint while the first one is retaking 
>> the look. Then we can get <print_optoassembly
>> id=1><print_optoassembly id=2></print_optoassembly (for id 
>> 1)></print_optoassembly (for id 2)> or <print_optoassembly 
>> id=1><print_optoassembly id=2></print_optoassembly (for id 
>> 2)></print_optoassembly
>> (for id 1)>
>>
>> Fortunately the xml stack consistency will be ok since it doesn't 
>> make any difference on what thread wrote the print_optoassembly tag.
>>
>> Pre-mortem
>> If this issue pops-up again we must investigate if there are more 
>> places in the compile log code that yields the tty lock on a 
>> safepoint. NoSafePointVerifiers don't seem to check on all transitions.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.02
>>
>> Testing:
>> All regression tests.
>>
>> Regards,
>> Nils Eliasson
>>
>> On 2016-04-20 22:07, Vladimir Kozlov wrote:
>>> On 4/20/16 12:56 PM, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> Thanks for the help,
>>>>
>>>> I got it to work, and added NoSafePointVerifiers to make sure I hadn't
>>>> missed anything. Then after many test iterations it failed again. It
>>>> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry 
>>>> to get
>>>> a ciSymbol->as_utf8. Now I am considering if I should direct 
>>>> dump_asm to
>>>> the temporary buffer too, or relax the tag checks in the xml and 
>>>> accept
>>>> that the output may need to be sorted by writer-thread before use. The
>>>> output looks like:
>>>>
>>>> <writer id=1/>
>>>> <print_optoassembly>
>>>> ...
>>>> releases tty when blocking on a safepoint
>>>> <writer id=2/>
>>>> <print_nmethod>
>>>> ...
>>>> <writer id=1/>         // back again after safepoint writing without
>>>> ttylock now.
>>>> </print_optoassembly>  // Here we fail on an assert today when we 
>>>> expect
>>>> a closing print_nmethod tag
>>>> <writer id=2/>
>>>> </print_nmethod>
>>>>
>>>> This is malformed xml but has enough information to be reconstructed.
>>>> Would this be an acceptable output?
>>>
>>> Yes, I think it is acceptable - we don't loose information. And it 
>>> is not worse than it was before.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Nils
>>>>
>>>>
>>>> On 2016-04-18 19:30, Vladimir Kozlov wrote:
>>>>> tty would have the same problem but it use C_HEAP to allocate:
>>>>>
>>>>>     defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal)
>>>>> defaultStream();
>>>>>
>>>>> Please, look if you can do something similar.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/18/16 4:24 AM, Nils Eliasson wrote:
>>>>>> Resizeable is better, but then we assert on expanding the 
>>>>>> stringbuffer
>>>>>> while being under a different ResourceMark.
>>>>>>
>>>>>> Regards,
>>>>>> Nils
>>>>>>
>>>>>> On 2016-04-15 19:44, Vladimir Kozlov wrote:
>>>>>>> Use resizable stream:
>>>>>>>
>>>>>>> stringStream(size_t initial_bufsize = 256);
>>>>>>>
>>>>>>> 1024 may not be enough.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please review this fix of print opto_assembly.
>>>>>>>>
>>>>>>>> Summary:
>>>>>>>> The compilelog can get corrupted and the VM may assert on "failed:
>>>>>>>> bad tag in log".
>>>>>>>>
>>>>>>>> When printing assembly in output.cpp we first take the ttylock, 
>>>>>>>> print
>>>>>>>> the head and then the method metadata. However the
>>>>>>>> metadata printing makes a vm entry and may block for a 
>>>>>>>> safepoint and
>>>>>>>> will then release the lock
>>>>>>>> (break_tty_lock_for_safepoint). After that some of the other 
>>>>>>>> compiler
>>>>>>>> thread that haven't safepointed will take the lock
>>>>>>>> and the broken log will be a fact when the safepoint is over 
>>>>>>>> and the
>>>>>>>> first thread starts logging again.
>>>>>>>>
>>>>>>>> Solution:
>>>>>>>> Print the method metadata to a temporary buffer, then take the tty
>>>>>>>> lock.
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> Repro from bug stops failing.
>>>>>>>> Running :hotspot_all
>>>>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Nils Eliasson
>>>>>>
>>>>
>>


From edward.nevill at gmail.com  Mon Apr 25 09:09:33 2016
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 25 Apr 2016 10:09:33 +0100
Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <1461172110.2941.63.camel@mylittlepony.linaroharston>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<5714D930.4090804@redhat.com>
	<CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>
	<57163063.3020506@redhat.com>
	<1461172110.2941.63.camel@mylittlepony.linaroharston>
Message-ID: <1461575373.10032.23.camel@mint>

Hi,


On Wed, 2016-04-20 at 18:08 +0100, Edward Nevill wrote:
> On Tue, 2016-04-19 at 14:19 +0100, Andrew Haley wrote:
> > On 04/19/2016 01:54 PM, Long Chen wrote:
> > > Would this be fine?
> > 
> > It might well be.  I'd like Ed to do a few measurements of large and
> > small block zeroing.  My guess is that a reasonably small unrolled loop
> > doing  STP ZR, ZR  will work better than anything else, but we'll see.
> 
> OK. So I started by doing some basic measurements of how long it takes to clear a cache line on 3 different partners HW using 3 different methods.
> 

I have redone these benchmarks using a JMH test provided by Andrew Haley. Thanks Andrew!

The test is here

http://people.linaro.org/~edward.nevill/block_zero/ArrayFill.java

And the results are here

http://people.linaro.org/~edward.nevill/block_zero/zva1.pdf


As a reminder, the different patches are

http://people.linaro.org/~edward.nevill/block_zero/stp.patch

Uses stp instead of str (no use of dc zva)

http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v01.patch

Long Chen's V01 patch

http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v02.patch

Long Chen's V02 patch

http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v03.patch

<zero single word to align base to 128 bit aligned address>
if (!small) {
  <zero remainder of first cache line using unrolled stp>
  <zero cache lines using dc zva>
}
<zero tail using unrolled stp>
<zero final word>

http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04.patch

Long Chen's v02 patch modified to avoid unaligned stp instructions.


>From this it seems that patches bzero3 and bzero4 produce better performance on all except very small zeros <= 16 bytes.

bzero3 significantly larger than bzero4 and would probably need outlining.

Also, this cutoff point from using stp/str instead of dc zva is set at 2 x cache lines (to guarantee there is at least 1 use of dc zva). A larger value may be better.

What I propose next, is only to look at bzero3 and bzero4, to modify bzero3 to out of line the dc zva loop and to look at the cutoff point from stp/str to dc zva to determine thr optimum cutoff point.

Thoughts?


Ed.


From edward.nevill at gmail.com  Mon Apr 25 09:28:14 2016
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 25 Apr 2016 10:28:14 +0100
Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <1461575373.10032.23.camel@mint>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<5714D930.4090804@redhat.com>
	<CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>
	<57163063.3020506@redhat.com>
	<1461172110.2941.63.camel@mylittlepony.linaroharston>
	<1461575373.10032.23.camel@mint>
Message-ID: <1461576494.10032.31.camel@mint>

On Mon, 2016-04-25 at 10:09 +0100, Edward Nevill wrote:
> Hi,
> 
> 
> On Wed, 2016-04-20 at 18:08 +0100, Edward Nevill wrote:
> > On Tue, 2016-04-19 at 14:19 +0100, Andrew Haley wrote:
> > > On 04/19/2016 01:54 PM, Long Chen wrote:
> > > > Would this be fine?
> > > 
> > > It might well be.  I'd like Ed to do a few measurements of large and
> > > small block zeroing.  My guess is that a reasonably small unrolled loop
> > > doing  STP ZR, ZR  will work better than anything else, but we'll see.
> > 
> > OK. So I started by doing some basic measurements of how long it takes to clear a cache line on 3 different partners HW using 3 different methods.
> > 
> 
> I have redone these benchmarks using a JMH test provided by Andrew Haley. Thanks Andrew!
> 
> The test is here
> 
> http://people.linaro.org/~edward.nevill/block_zero/ArrayFill.java
> 

One interesting data point is the interaction between zeroing memory and allocation prefetch. The following shows this.

http://people.linaro.org/~edward.nevill/block_zero/noprefetch.pdf

Peformance is improved significantly by turning off allocation prefetch.

The problem is that allocation prefetch forces the cache line into L1.

The zero mem then has to wait until the cache line has loaded before it can zero it.

Therefore performance is much better turning off allocation prefetch altogether.

When allocation prefetch is turned off, the str/stp/dc zva just zeros L2 cache and L1 remains unaffected.

Also ZeroTlab should improve performance since this will reverse the order of the str/stp/dc zva and the prefetch.

However, I think the tuning of prefetch / zero tlab is a separate exercise.

All the best,
Ed.


From aph at redhat.com  Mon Apr 25 10:05:32 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 25 Apr 2016 11:05:32 +0100
Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <1461575373.10032.23.camel@mint>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<5714D930.4090804@redhat.com>
	<CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>
	<57163063.3020506@redhat.com>
	<1461172110.2941.63.camel@mylittlepony.linaroharston>
	<1461575373.10032.23.camel@mint>
Message-ID: <571DEBEC.3050504@redhat.com>

On 04/25/2016 10:09 AM, Edward Nevill wrote:
> And the results are here
> 
> http://people.linaro.org/~edward.nevill/block_zero/zva1.pdf

Bigger numbers are worse, right?

Andrew.


From martin.doerr at sap.com  Mon Apr 25 11:04:28 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 25 Apr 2016 11:04:28 +0000
Subject: RFR(S): 8154836: VM crash due to "Base pointers must match"
Message-ID: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>

Hi all,

we have already seen such an assertion in final_graph_reshaping.

We found 2 AddP nodes in a row. The ideal graph looked like this (simplified):
N0 ConP
N1 ConN
N2 AddP(Base = N0, Address = N0)
N3 AddP(Base = N0, Address = N2)

Final graph reshaping visited N2 before N3 first and changed the graph:
N0 ConP
N1 ConN
N4 DecodeN
N2 AddP(Base = N4, Address = N4)
N3 AddP(Base = N0, Address = N2)

Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected.

I made a change to reconnect N3's Base input to N4, too.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/

In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row.
I wouldn't expect that to happen. Not sure if this assertion is desired.

I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive.
Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode.
(I believe X86 is the only platform which can match the decoding in the operand in this case.)

Please review. I will also need a sponsor, please.

Best regards,
Martin


From rwestrel at redhat.com  Mon Apr 25 13:18:06 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 25 Apr 2016 15:18:06 +0200
Subject: RFR(XS): 8155015: Aarch64: bad assert in spill generation code
Message-ID: <571E190E.5050600@redhat.com>

http://cr.openjdk.java.net/~roland/8155015/webrev.00/

I hit that broken assert when doing some testing.

Roland.

From tobias.hartmann at oracle.com  Mon Apr 25 13:26:00 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 25 Apr 2016 15:26:00 +0200
Subject: RFR(XS): 8155015: Aarch64: bad assert in spill generation code
In-Reply-To: <571E190E.5050600@redhat.com>
References: <571E190E.5050600@redhat.com>
Message-ID: <571E1AE8.4020000@oracle.com>

Hi Roland,

On 25.04.2016 15:18, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8155015/webrev.00/
> 
> I hit that broken assert when doing some testing.

That looks good to me!

Best regards,
Tobias

> 
> Roland.
> 

From rwestrel at redhat.com  Mon Apr 25 13:43:03 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 25 Apr 2016 15:43:03 +0200
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <c3d397f7-f909-4b65-42e6-78494e58fe9c@oracle.com>
References: <57188E03.5070303@redhat.com>
	<c3d397f7-f909-4b65-42e6-78494e58fe9c@oracle.com>
Message-ID: <571E1EE7.7090204@redhat.com>

Hi Dean,

Thanks for looking at this.

> Hi Roland.  This sounds like it has a lot of overlap with JDK-6217251. 
> If so, could you update JDK-6217251 explaining what more, if anything,
> needs to be done?

It's similar indeed except that 6217251 suggests using knowledge of the
loop structures to trigger the optimization. My change assumes it's much
more common that the index of an array access is loop variant and the
array itself is not so while it could lead to suboptimal code, it's
uncommon.

Roland.

> 
> thanks,
> 
> dl
> 
> On 4/21/2016 1:23 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8154826/webrev.00/
>>
>> The aarch64 port implicitly transforms:
>> (AddP base (AddP base address (LShiftL index con)) offset)
>> into:
>> (AddP base (AddP base offset) (LShiftL index con))
>> in the ad file to embed the shift (and possibly and i2l conversion) into
>> the addressing mode of a memory operation. Exposing that transformation
>> in the ideal graph allows:
>>
>> - (AddP base offset) to be scheduled (for instance outside a loop)
>> - multiple identical (AddP base offset) to be commoned
>> - (LShiftL index con) to be cloned during matching so that each memory
>> access has its own
>>
>> Roland.
> 

From rwestrel at redhat.com  Mon Apr 25 13:45:57 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 25 Apr 2016 15:45:57 +0200
Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64
In-Reply-To: <ae3bcef8-601e-16f9-5d6e-2669b1d366ce@oracle.com>
References: <571A1B48.8040304@redhat.com>
	<ae3bcef8-601e-16f9-5d6e-2669b1d366ce@oracle.com>
Message-ID: <571E1F95.9040309@redhat.com>

Thank Vladimir and Michael for reviewing this.

FWIW, I'll investigate whether it makes sense for aarch64 to use unroll
analysis. If it does, then that fix isn't required. It might be nice to
have anyway. What do you think?

Roland.

From rwestrel at redhat.com  Mon Apr 25 13:50:53 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 25 Apr 2016 15:50:53 +0200
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <57188E03.5070303@redhat.com>
References: <57188E03.5070303@redhat.com>
Message-ID: <571E20BD.3030907@redhat.com>


On 04/21/2016 10:23 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8154826/webrev.00/
> 
> The aarch64 port implicitly transforms:
> (AddP base (AddP base address (LShiftL index con)) offset)
> into:
> (AddP base (AddP base offset) (LShiftL index con))
> in the ad file to embed the shift (and possibly and i2l conversion) into
> the addressing mode of a memory operation. Exposing that transformation
> in the ideal graph allows:
> 
> - (AddP base offset) to be scheduled (for instance outside a loop)
> - multiple identical (AddP base offset) to be commoned
> - (LShiftL index con) to be cloned during matching so that each memory
> access has its own

Further testing revealed some problems with the previous change so here
is a new webrev:

http://cr.openjdk.java.net/~roland/8154826/webrev.01/

Memory access instructions only accept a shift that matches the size of
the data being accessed (i.e. 2 for a 4 byte load). It's not always the
case that address expressions produced by c2 follow that constraint. As
expected, tt's very rare that it doesn't and seem to occur only with
unsafe. I added a predicate that skips the transformation of the address
subtree at the end of the optimization passes and prevent matching of
the address subtree if the constraint is not met for one use. As this is
uncommon, that seems good enough.

Roland.

From tobias.hartmann at oracle.com  Mon Apr 25 14:11:54 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 25 Apr 2016 16:11:54 +0200
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <571DCE55.6070506@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com>
	<571DCE55.6070506@oracle.com>
Message-ID: <571E25AA.6090303@oracle.com>

Hi Mikael,

On 25.04.2016 09:59, Mikael Gerdin wrote:
> Hi Tobias
> 
> On 2016-04-25 08:38, Tobias Hartmann wrote:
>> Hi,
>>
>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).
> 
> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned.
> 
> With compressed oops enabled the mark word + compressed class + alength
> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned.
> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case.

Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned.

Best regards,
Tobias

> 
> /Mikael
> 
>>
>> If there are no objections, I would like to push the basic version:
>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>
>> If unaligned performance turns out to be a problem, we can still improve the intrinsic.
>>
>> Thanks,
>> Tobias
>>
>> On 20.04.2016 15:31, Tobias Hartmann wrote:
>>> Hi John,
>>>
>>> On 20.04.2016 03:46, John Rose wrote:
>>>> So I started looking at your code and my inner SPARC junkie took over.
>>>>
>>>> This is what happened:
>>>>    http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
>>>
>>> Thanks a lot for having a look!
>>>
>>>> Perhaps there are some ideas that might be helpful:
>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
>>>
>>> Right, this simplifies the code a bit:
>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
>>>
>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
>>>
>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
>>>
>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
>>>
>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
>>>
>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
>>>
>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
>>>
>>> What do you think?
>>>
>>> Thanks,
>>> Tobias
>>>
>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>> [3] Runtime alignment checks:
>>>
>>> bind(Lunaligned);
>>> Label next;
>>> xor3(ary1, ary2, tmp);
>>> and3(tmp, 7, tmp);
>>> br_null_short(tmp, Assembler::pn, next);
>>> STOP("One array is unaligned!");
>>> should_not_reach_here();
>>> bind(next);
>>> STOP("Both arrays are unaligned!");
>>>
>>>> On the other hand, what you wrote is nice and simple.
>>>>
>>>> HTH
>>>> ? John
>>>>
>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>>>> But that's neither nice nor simple.
>>>>
>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>
>>>>> Thanks, Vladimir!
>>>>>
>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>>>
>>>>> Okay, I'll push the basic version.
>>>>>
>>>>> For reference, here are the results on a SPARC T4:
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>>>
>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>>>> We do have arraycopy code for it but by default we don't use it:
>>>>>>   product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>>>   product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>>>
>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>>>
>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>>>
>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>>>> ArraycopySrcPrefetchDistance (42) must be 0
>>>>> Error: Could not create the Java Virtual Machine.
>>>>> Error: A fatal exception has occurred. Program will exit
>>>>>
>>>>> Thanks,
>>>>> Tobias
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> please review the following enhancement:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>>>
>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>>>
>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>>>
>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>>>
>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>>>
>>>>>>> I evaluated the following three versions of the patch.
>>>>>>>
>>>>>>> -- Basic --
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>
>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>>>
>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>> Version "small" tries to improve this.
>>>>>>>
>>>>>>> -- Prefetching --
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>
>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>>>
>>>>>>> -- Small --
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>>>
>>>>>>> The numbers can be found here:
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>>>
>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tobias
>>>>>>>
>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>>>>>>

From edward.nevill at gmail.com  Mon Apr 25 14:47:03 2016
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 25 Apr 2016 15:47:03 +0100
Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA'
In-Reply-To: <571DEBEC.3050504@redhat.com>
References: <CADwSXq8uw3Zh6HkbKTU6aoTnBTjteWxzxTJoBWKqV_m-4OZY6w@mail.gmail.com>
	<5714D930.4090804@redhat.com>
	<CADwSXq8ZLnNarac5aOgmJo_30WDKtHtJLnt13kPArMGu0D+iEw@mail.gmail.com>
	<57163063.3020506@redhat.com>
	<1461172110.2941.63.camel@mylittlepony.linaroharston>
	<1461575373.10032.23.camel@mint>  <571DEBEC.3050504@redhat.com>
Message-ID: <1461595623.10032.37.camel@mint>

On Mon, 2016-04-25 at 11:05 +0100, Andrew Haley wrote:
> On 04/25/2016 10:09 AM, Edward Nevill wrote:
> > And the results are here
> > 
> > http://people.linaro.org/~edward.nevill/block_zero/zva1.pdf
> 
> Bigger numbers are worse, right?
> 

Right, original normalised to 100%,

Regards,
Ed.


From martin.doerr at sap.com  Mon Apr 25 16:06:54 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 25 Apr 2016 16:06:54 +0000
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <571E20BD.3030907@redhat.com>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
Message-ID: <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>

Hi Roland,

seems like this issue is related to what I have sent out today:
RFR(S): 8154836: VM crash due to "Base pointers must match"
I also had to change the AddP case of final graph reshaping.

In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode.

Maybe I have to remove that part of my change, or at least adapt it.
We should make sure that the changes don't get pushed on the same day.

Best regards,
Martin

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Roland Westrelin
Sent: Montag, 25. April 2016 15:51
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode


On 04/21/2016 10:23 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8154826/webrev.00/
> 
> The aarch64 port implicitly transforms:
> (AddP base (AddP base address (LShiftL index con)) offset)
> into:
> (AddP base (AddP base offset) (LShiftL index con))
> in the ad file to embed the shift (and possibly and i2l conversion) into
> the addressing mode of a memory operation. Exposing that transformation
> in the ideal graph allows:
> 
> - (AddP base offset) to be scheduled (for instance outside a loop)
> - multiple identical (AddP base offset) to be commoned
> - (LShiftL index con) to be cloned during matching so that each memory
> access has its own

Further testing revealed some problems with the previous change so here
is a new webrev:

http://cr.openjdk.java.net/~roland/8154826/webrev.01/

Memory access instructions only accept a shift that matches the size of
the data being accessed (i.e. 2 for a 4 byte load). It's not always the
case that address expressions produced by c2 follow that constraint. As
expected, tt's very rare that it doesn't and seem to occur only with
unsafe. I added a predicate that skips the transformation of the address
subtree at the end of the optimization passes and prevent matching of
the address subtree if the constraint is not met for one use. As this is
uncommon, that seems good enough.

Roland.

From christian.thalinger at oracle.com  Mon Apr 25 19:29:13 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 25 Apr 2016 09:29:13 -1000
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <571E25AA.6090303@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com>
	<571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com>
Message-ID: <DC5053CC-CE39-4FA1-BC7F-8C74EE40D761@oracle.com>


> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Mikael,
> 
> On 25.04.2016 09:59, Mikael Gerdin wrote:
>> Hi Tobias
>> 
>> On 2016-04-25 08:38, Tobias Hartmann wrote:
>>> Hi,
>>> 
>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).
>> 
>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned.
>> 
>> With compressed oops enabled the mark word + compressed class + alength
>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned.
>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case.
> 
> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned.

See arrayOop.hpp:

  // Returns the offset of the first element.
  static int base_offset_in_bytes(BasicType type) {
    return header_size(type) * HeapWordSize;
  }

> 
> Best regards,
> Tobias
> 
>> 
>> /Mikael
>> 
>>> 
>>> If there are no objections, I would like to push the basic version:
>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>> 
>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic.
>>> 
>>> Thanks,
>>> Tobias
>>> 
>>> On 20.04.2016 15:31, Tobias Hartmann wrote:
>>>> Hi John,
>>>> 
>>>> On 20.04.2016 03:46, John Rose wrote:
>>>>> So I started looking at your code and my inner SPARC junkie took over.
>>>>> 
>>>>> This is what happened:
>>>>>   http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
>>>> 
>>>> Thanks a lot for having a look!
>>>> 
>>>>> Perhaps there are some ideas that might be helpful:
>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
>>>> 
>>>> Right, this simplifies the code a bit:
>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
>>>> 
>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
>>>> 
>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
>>>> 
>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
>>>> 
>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
>>>> 
>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
>>>> 
>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
>>>> 
>>>> What do you think?
>>>> 
>>>> Thanks,
>>>> Tobias
>>>> 
>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>> [3] Runtime alignment checks:
>>>> 
>>>> bind(Lunaligned);
>>>> Label next;
>>>> xor3(ary1, ary2, tmp);
>>>> and3(tmp, 7, tmp);
>>>> br_null_short(tmp, Assembler::pn, next);
>>>> STOP("One array is unaligned!");
>>>> should_not_reach_here();
>>>> bind(next);
>>>> STOP("Both arrays are unaligned!");
>>>> 
>>>>> On the other hand, what you wrote is nice and simple.
>>>>> 
>>>>> HTH
>>>>> ? John
>>>>> 
>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>>>>> But that's neither nice nor simple.
>>>>> 
>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>> 
>>>>>> Thanks, Vladimir!
>>>>>> 
>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>>>> 
>>>>>> Okay, I'll push the basic version.
>>>>>> 
>>>>>> For reference, here are the results on a SPARC T4:
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>>>> 
>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>>>>> We do have arraycopy code for it but by default we don't use it:
>>>>>>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>>>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>>>> 
>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>>>> 
>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>>>> 
>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>>>>> ArraycopySrcPrefetchDistance (42) must be 0
>>>>>> Error: Could not create the Java Virtual Machine.
>>>>>> Error: A fatal exception has occurred. Program will exit
>>>>>> 
>>>>>> Thanks,
>>>>>> Tobias
>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> please review the following enhancement:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>>>> 
>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>>>> 
>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>>>> 
>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>>>> 
>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>>>> 
>>>>>>>> I evaluated the following three versions of the patch.
>>>>>>>> 
>>>>>>>> -- Basic --
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>> 
>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>>>> 
>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>> Version "small" tries to improve this.
>>>>>>>> 
>>>>>>>> -- Prefetching --
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>> 
>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>>>> 
>>>>>>>> -- Small --
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>>>> 
>>>>>>>> The numbers can be found here:
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>>>> 
>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>>>> 
>>>>>>>> What do you think?
>>>>>>>> 
>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Tobias
>>>>>>>> 
>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160425/40da2721/attachment-0001.html>

From vladimir.kozlov at oracle.com  Mon Apr 25 22:40:04 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 25 Apr 2016 15:40:04 -0700
Subject: CR for RFR 8154896
In-Reply-To: <C568518E7B433348B114B6A7122D474756E6469B@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E6469B@FMSMSX102.amr.corp.intel.com>
Message-ID: <571E9CC4.2010107@oracle.com>

Looks good. I will start testing.

Thanks,
Vladimir

On 4/23/16 7:14 PM, Berg, Michael C wrote:
> Hi Folks,
>
> I would like to contribute a bug fix for SKX/EVEX code gen. There is a
> guarantee of isBit(imm8) for jccb which can sometimes fail when upper
> bank register marshaling is required for instructions without EVEX or
> conditionally EVEX support on SKX.  This patch address the minimal set
> of changes which can have this issue.
>
> This code was tested as follows(see jbs entry below):
>
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896
>
>
> webrev:
>
> http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/
>
> Thanks,
>
> Michael
>

From tobias.hartmann at oracle.com  Tue Apr 26 11:25:02 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 26 Apr 2016 13:25:02 +0200
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <DC5053CC-CE39-4FA1-BC7F-8C74EE40D761@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com>
	<571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com>
	<DC5053CC-CE39-4FA1-BC7F-8C74EE40D761@oracle.com>
Message-ID: <571F500E.5020904@oracle.com>

Thanks, Chris!

On 25.04.2016 21:29, Christian Thalinger wrote:
> 
>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>
>> Hi Mikael,
>>
>> On 25.04.2016 09:59, Mikael Gerdin wrote:
>>> Hi Tobias
>>>
>>> On 2016-04-25 08:38, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).
>>>
>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned.
>>>
>>> With compressed oops enabled the mark word + compressed class + alength
>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned.
>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case.
>>
>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned.
> 
> See arrayOop.hpp:
> 
>   // Returns the offset of the first element.
>   static int base_offset_in_bytes(BasicType type) {
>     return header_size(type) * HeapWordSize;
>   }

I verified that Mikael's claim is right:
With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4).
Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4).

However, the header size is always aligned to HeapWordSize:

static int header_size_in_bytes() {
  size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize);

and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned.

Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance:
http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/

Thanks,
Tobias

>>
>> Best regards,
>> Tobias
>>
>>>
>>> /Mikael
>>>
>>>>
>>>> If there are no objections, I would like to push the basic version:
>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>
>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic.
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>> On 20.04.2016 15:31, Tobias Hartmann wrote:
>>>>> Hi John,
>>>>>
>>>>> On 20.04.2016 03:46, John Rose wrote:
>>>>>> So I started looking at your code and my inner SPARC junkie took over.
>>>>>>
>>>>>> This is what happened:
>>>>>>   http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
>>>>>
>>>>> Thanks a lot for having a look!
>>>>>
>>>>>> Perhaps there are some ideas that might be helpful:
>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
>>>>>
>>>>> Right, this simplifies the code a bit:
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
>>>>>
>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
>>>>>
>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
>>>>>
>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
>>>>>
>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
>>>>>
>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
>>>>>
>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Thanks,
>>>>> Tobias
>>>>>
>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>> [3] Runtime alignment checks:
>>>>>
>>>>> bind(Lunaligned);
>>>>> Label next;
>>>>> xor3(ary1, ary2, tmp);
>>>>> and3(tmp, 7, tmp);
>>>>> br_null_short(tmp, Assembler::pn, next);
>>>>> STOP("One array is unaligned!");
>>>>> should_not_reach_here();
>>>>> bind(next);
>>>>> STOP("Both arrays are unaligned!");
>>>>>
>>>>>> On the other hand, what you wrote is nice and simple.
>>>>>>
>>>>>> HTH
>>>>>> ? John
>>>>>>
>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>>>>>> But that's neither nice nor simple.
>>>>>>
>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>>>
>>>>>>> Thanks, Vladimir!
>>>>>>>
>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>>>>>
>>>>>>> Okay, I'll push the basic version.
>>>>>>>
>>>>>>> For reference, here are the results on a SPARC T4:
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>>>>>
>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>>>>>> We do have arraycopy code for it but by default we don't use it:
>>>>>>>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>>>>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>>>>>
>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>>>>>
>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>>>>>
>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0
>>>>>>> Error: Could not create the Java Virtual Machine.
>>>>>>> Error: A fatal exception has occurred. Program will exit
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tobias
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> please review the following enhancement:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>>>>>
>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>>>>>
>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>>>>>
>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>>>>>
>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>>>>>
>>>>>>>>> I evaluated the following three versions of the patch.
>>>>>>>>>
>>>>>>>>> -- Basic --
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>
>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>>>>>
>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>> Version "small" tries to improve this.
>>>>>>>>>
>>>>>>>>> -- Prefetching --
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>
>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>>>>>
>>>>>>>>> -- Small --
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>>>>>
>>>>>>>>> The numbers can be found here:
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>>>>>
>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Tobias
>>>>>>>>>
>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
> 

From zoltan.majo at oracle.com  Tue Apr 26 11:56:06 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 26 Apr 2016 13:56:06 +0200
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <57195621.7050307@oracle.com>
References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com>
Message-ID: <571F5756.7020007@oracle.com>

Hi Vladimir,


thank you for the feedback! Please see comments below.

On 04/22/2016 12:37 AM, Vladimir Kozlov wrote:
> Hi, Zoltan
>
> I think we should change code in prefetch_allocation() instead:
>
>       Node *cache_adr = new AddPNode(old_eden_top, old_eden_top,
> _igvn.MakeConX(step_size + distance));

The problem is that AllocatePrefetchStyle must align the first 
prefetched address to 8 bytes, otherwise the emitted stxa instruction 
could cause a SIGBUS. But the alignment does not have to be at step_size 
boundary, 8-byte alignment is sufficient.

>
> These way we allow AllocatePrefetchDistance == 0 in all 
> AllocatePrefetchStyle cases - it is consistent.

Unfortunately, it is not easy to have the same limit for 
AllocatePrefetchDistance on all platforms. Due to the 8-byte alignment 
performed by compiled code, the lower limit of 8 for 
AllocatePrefetchDistance is needed on SPARC; the lower limit of 0 works 
fine on all other platforms.

I've started looking at the consistency of flags controlling allocation 
prefetch in general, as we have other issues open related to them (e.g., 
JDK-8151622). We're touching related code now, so I thought, maybe it 
makes sense to fix all remaining issues at once.

The updated webrev does the following (in addition to fixing the 
original problem with AllocatePrefetchDistance):

1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 (i.e., 
BIS instructions are used for prefetching). As far as I understand, 
AllocatePrefetchStyle = 3 was added to support prefetching with BIS, so 
if BIS is enabled, we should use AllocatePrefetchStyle = 3.

2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on 
non-SPARC platforms.

3. Enforce that AllocatePrefetchStepSize is multiple of 8 if 
AllocatePrefetchStyle is 3 (due to alignment requirements).

3. Determine the number of lines to prefetch in the same way for all 
prefetch styles:
lines = (prefecth instance allocation) ? AllocateInstancePrefetchLines : 
AllocatePrefetchLines

Here is the updated webrev:
http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/

Testing:
- JPRT (incl. TestOptionsWithRanges)
- local testing on a SPARC machine.

Thank you!

Best regards,


Zoltan


>
> Thanks,
> Vladimir
>
> On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
>> Hi,
>>
>>
>> please review the patch for 8153340.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153340
>>
>>
>> Problem: The VM crashes if AllocatePrefetchStyle==3 and 
>> AllocatePrefetchDistance==0. The crash happens due to the way the 
>> address for the first prefetch instruction is calculated [1]:
>>
>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= 
>> ~(AllocatePrefetchStepSize - 1) which can zero some of the bits of 
>> cache_adr. That result in accesses *before* the newly allocated object.
>>
>>
>> Solution: Set lower limit of AllocatePrefetchDistance to 
>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). 
>> Unquarantine test.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>>
>> Testing:
>> - JPRT (incl. TestOptionsWithRanges.java)
>> - local testing on a SPARC machine.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>> [1] 
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941


From rwestrel at redhat.com  Tue Apr 26 14:29:50 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 26 Apr 2016 16:29:50 +0200
Subject: SuperWord::unrolling_analysis() question
Message-ID: <dk67ffko335.fsf@rwestrel.remote.csb>


Why does SuperWord::unrolling_analysis() use:

int max_vector = Matcher::max_vector_size(T_INT);

instead of:

int max_vector = Matcher::max_vector_size(T_BYTE);

?

For a loop like this:

static void test_byte(byte[] src, byte[] dst) {
    for (int i = 0; i < src.length; i++) {
        dst[i] = src[i];
    } 
}


It limits the size of the vectors that are used below what the hardware
can support.

Roland.


From andreas.woess at oracle.com  Fri Apr 22 14:36:07 2016
From: andreas.woess at oracle.com (Andreas Woess)
Date: Fri, 22 Apr 2016 16:36:07 +0200
Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable
	array fields
Message-ID: <571A36D7.2030701@oracle.com>

Please review:
http://cr.openjdk.java.net/~aw/8154765/webrev/
https://bugs.openjdk.java.net/browse/JDK-8154765

This change adds an optional stableDimensions parameter to 
ConstantReflectionProvider.readStableFieldValue that allows the number 
of stable array dimensions to be specified more fine-granular than 
inferring it from the type of the field.

Thanks,
Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160422/d3c24b94/attachment-0001.html>

From andreas.woess at oracle.com  Tue Apr 26 15:42:17 2016
From: andreas.woess at oracle.com (Andreas Woess)
Date: Tue, 26 Apr 2016 17:42:17 +0200
Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable
	array fields
Message-ID: <571F8C59.8050707@oracle.com>

Please review:
http://cr.openjdk.java.net/~aw/8154765/webrev/
https://bugs.openjdk.java.net/browse/JDK-8154765

This change adds an optional stableDimensions parameter to 
ConstantReflectionProvider.readStableFieldValue that allows the number 
of stable array dimensions to be specified more fine-granular than 
inferring it from the type of the field.

Thanks,
Andreas


From christian.thalinger at oracle.com  Tue Apr 26 17:11:09 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 26 Apr 2016 07:11:09 -1000
Subject: CR for RFR 8154896
In-Reply-To: <C568518E7B433348B114B6A7122D474756E6469B@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E6469B@FMSMSX102.amr.corp.intel.com>
Message-ID: <EEA4AA14-026A-4301-804F-42E6A13EB1CD@oracle.com>


> On Apr 23, 2016, at 4:14 PM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
>  <>Hi Folks,
> 
> I would like to contribute a bug fix for SKX/EVEX code gen.

The bug fix is the change in src/cpu/x86/vm/assembler_x86.cpp, correct?

>   There is a guarantee of isBit(imm8) for jccb which can sometimes fail when upper bank register marshaling is required for instructions without EVEX or conditionally EVEX support on SKX.  This patch address the minimal set of changes which can have this issue.
>  
> This code was tested as follows (see jbs entry below):
> 
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896 <https://bugs.openjdk.java.net/browse/JDK-8154896>
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/ <http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/>
>  
> Thanks,
> Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160426/03581225/attachment.html>

From vladimir.kozlov at oracle.com  Tue Apr 26 17:56:04 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 26 Apr 2016 10:56:04 -0700
Subject: SuperWord::unrolling_analysis() question
In-Reply-To: <dk67ffko335.fsf@rwestrel.remote.csb>
References: <dk67ffko335.fsf@rwestrel.remote.csb>
Message-ID: <3e4d91ad-3391-8c73-0be6-ebeddb811613@oracle.com>

The question is for Michael. I would say it is typo but let Michael to answer.

Thanks,
Vladimir

On 4/26/16 7:29 AM, Roland Westrelin wrote:
>
> Why does SuperWord::unrolling_analysis() use:
>
> int max_vector = Matcher::max_vector_size(T_INT);
>
> instead of:
>
> int max_vector = Matcher::max_vector_size(T_BYTE);
>
> ?
>
> For a loop like this:
>
> static void test_byte(byte[] src, byte[] dst) {
>     for (int i = 0; i < src.length; i++) {
>         dst[i] = src[i];
>     }
> }
>
>
> It limits the size of the vectors that are used below what the hardware
> can support.
>
> Roland.
>

From michael.c.berg at intel.com  Tue Apr 26 18:15:58 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Tue, 26 Apr 2016 18:15:58 +0000
Subject: CR for RFR 8154896
In-Reply-To: <EEA4AA14-026A-4301-804F-42E6A13EB1CD@oracle.com>
References: <C568518E7B433348B114B6A7122D474756E6469B@FMSMSX102.amr.corp.intel.com>
	<EEA4AA14-026A-4301-804F-42E6A13EB1CD@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E652C4@FMSMSX102.amr.corp.intel.com>

No, the actual bug fix is in the jccb to jcc changes.
The assembler change is a correction for compressed displacement.

-Michael

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Tuesday, April 26, 2016 10:11 AM
To: Berg, Michael C <michael.c.berg at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: CR for RFR 8154896


On Apr 23, 2016, at 4:14 PM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

Hi Folks,

I would like to contribute a bug fix for SKX/EVEX code gen.

The bug fix is the change in src/cpu/x86/vm/assembler_x86.cpp, correct?


  There is a guarantee of isBit(imm8) for jccb which can sometimes fail when upper bank register marshaling is required for instructions without EVEX or conditionally EVEX support on SKX.  This patch address the minimal set of changes which can have this issue.

This code was tested as follows (see jbs entry below):

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896

webrev:
http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/

Thanks,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160426/72e540ee/attachment.html>

From vivek.r.deshpande at intel.com  Tue Apr 26 18:22:43 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Tue, 26 Apr 2016 18:22:43 +0000
Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
In-Reply-To: <C568518E7B433348B114B6A7122D474756E646B3@FMSMSX102.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com>
	<571AD5BC.10309@oracle.com>
	<C568518E7B433348B114B6A7122D474756E644EF@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E646B3@FMSMSX102.amr.corp.intel.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A893B5@ORSMSX106.amr.corp.intel.com>

HI Vladimir

I have updated the webrev with all suggested changes.
The webrev is at this location:
http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.01/

Regards
Vivek

-----Original Message-----
From: Berg, Michael C 
Sent: Saturday, April 23, 2016 7:24 PM
To: Vladimir Kozlov; Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512

More info:

Setvectmask and restorevectmask are used for auto code generation.

The method attached here is used entirely for stub code assembly and is intended only for stub code usage and checked via assertions as such.

Regards,
Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
Sent: Friday, April 22, 2016 7:30 PM
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512

Vladimir, it Is not related to 8153998.
It seems his patch is not sync'd against the head of the tree.

-Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Friday, April 22, 2016 6:54 PM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Berg, Michael C <michael.c.berg at intel.com>
Subject: Re: RFR (S): 8154975: Update for vectorizedMismatch with AVX512

Hi Vivek,

How it is related to next macro instructions from JDK-8153998?

+  // special instructions for EVEX
+  void setvectmask(Register dst, Register src);  void 
+ restorevectmask();

Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code.

_programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking


I don't like next code in assembler instructions:
+  if (zeroing) attributes.set_is_clear_context();

+  if (!no_reg_mask) {
+    attributes.set_embedded_opmask_register_specifier(mask);
+    if (zeroing) attributes.set_is_clear_context();
+  }

zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them.
_embedded_opmask_register_specifier is not used (only set). Don't add values which are not used.

Thanks,
Vladimir

On 4/22/16 6:10 PM, Deshpande, Vivek R wrote:
> Hi all
>
> I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic.
>
> Could you please review and sponsor this patch.
>
> Bug-id:
>
> https://bugs.openjdk.java.net/browse/JDK-8154975
>
> webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/
>
> Thanks and regards,
>
> Vivek
>

From michael.c.berg at intel.com  Tue Apr 26 18:27:42 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Tue, 26 Apr 2016 18:27:42 +0000
Subject: SuperWord::unrolling_analysis() question
In-Reply-To: <dk67ffko335.fsf@rwestrel.remote.csb>
References: <dk67ffko335.fsf@rwestrel.remote.csb>
Message-ID: <C568518E7B433348B114B6A7122D474756E652F1@FMSMSX102.amr.corp.intel.com>

Roland,

The answer could be conditional if we had a machines with enough byte or short components to make vectors with, I chose INT as it is the current consistent minimum configuration for complete vector mapping.  The best answer would be to create some code which mines the common type used in the current loops expressions, but I think we would be stuck with two passes over the code, the first to bind the common type, the second for finding the optimal sub vector mapping.  Or possibly moving the question to the machine layer as a query, where compiler writers choose the minimum consistent configuration based on current info on the machine we compile on.

-Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Roland Westrelin
Sent: Tuesday, April 26, 2016 7:30 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: SuperWord::unrolling_analysis() question


Why does SuperWord::unrolling_analysis() use:

int max_vector = Matcher::max_vector_size(T_INT);

instead of:

int max_vector = Matcher::max_vector_size(T_BYTE);

?

For a loop like this:

static void test_byte(byte[] src, byte[] dst) {
    for (int i = 0; i < src.length; i++) {
        dst[i] = src[i];
    } 
}


It limits the size of the vectors that are used below what the hardware
can support.

Roland.


From christian.thalinger at oracle.com  Tue Apr 26 19:27:06 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 26 Apr 2016 09:27:06 -1000
Subject: CR for RFR 8154896
In-Reply-To: <C568518E7B433348B114B6A7122D474756E652C4@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474756E6469B@FMSMSX102.amr.corp.intel.com>
	<EEA4AA14-026A-4301-804F-42E6A13EB1CD@oracle.com>
	<C568518E7B433348B114B6A7122D474756E652C4@FMSMSX102.amr.corp.intel.com>
Message-ID: <BC13F240-8124-476F-9367-5705046EC8FC@oracle.com>


> On Apr 26, 2016, at 8:15 AM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
> No, the actual bug fix is in the jccb to jcc changes.
> The assembler change is a correction for compressed displacement.

A ?correction?? :-)  Anyway, looks good.

>  
> -Michael
>  
> From: Christian Thalinger [mailto:christian.thalinger at oracle.com] 
> Sent: Tuesday, April 26, 2016 10:11 AM
> To: Berg, Michael C <michael.c.berg at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8154896
>  
>  
> On Apr 23, 2016, at 4:14 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>  
>  <>Hi Folks,
> 
> I would like to contribute a bug fix for SKX/EVEX code gen.
>  
> The bug fix is the change in src/cpu/x86/vm/assembler_x86.cpp, correct?
> 
> 
>   There is a guarantee of isBit(imm8) for jccb which can sometimes fail when upper bank register marshaling is required for instructions without EVEX or conditionally EVEX support on SKX.  This patch address the minimal set of changes which can have this issue.
>  
> This code was tested as follows (see jbs entry below):
> 
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896 <https://bugs.openjdk.java.net/browse/JDK-8154896>
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/ <http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/>
>  
> Thanks,
> Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160426/de91cd8b/attachment.html>

From christian.thalinger at oracle.com  Tue Apr 26 20:33:38 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 26 Apr 2016 10:33:38 -1000
Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable
	array fields
In-Reply-To: <571A36D7.2030701@oracle.com>
References: <571A36D7.2030701@oracle.com>
Message-ID: <BD3F2BD6-9A39-4629-9931-C03E48F08B98@oracle.com>

Looks good.

> On Apr 22, 2016, at 4:36 AM, Andreas Woess <andreas.woess at oracle.com> wrote:
> 
> Please review:
> http://cr.openjdk.java.net/~aw/8154765/webrev/ <http://cr.openjdk.java.net/~aw/8154765/webrev/>
> https://bugs.openjdk.java.net/browse/JDK-8154765 <https://bugs.openjdk.java.net/browse/JDK-8154765>
> 
> This change adds an optional stableDimensions parameter to ConstantReflectionProvider.readStableFieldValue that allows the number of stable array dimensions to be specified more fine-granular than inferring it from the type of the field.
> 
> Thanks,
> Andreas
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160426/d41b6b62/attachment.html>

From vladimir.kozlov at oracle.com  Tue Apr 26 22:39:14 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 26 Apr 2016 15:39:14 -0700
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <571F5756.7020007@oracle.com>
References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com>
	<571F5756.7020007@oracle.com>
Message-ID: <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com>

On 4/26/16 4:56 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> thank you for the feedback! Please see comments below.
>
> On 04/22/2016 12:37 AM, Vladimir Kozlov wrote:
>> Hi, Zoltan
>>
>> I think we should change code in prefetch_allocation() instead:
>>
>>       Node *cache_adr = new AddPNode(old_eden_top, old_eden_top,
>> _igvn.MakeConX(step_size + distance));
>
> The problem is that AllocatePrefetchStyle must align the first prefetched address to 8 bytes, otherwise the emitted stxa
> instruction could cause a SIGBUS. But the alignment does not have to be at step_size boundary, 8-byte alignment is
> sufficient.

Actually it has to be step_size (cache line size) aligned - BIS instruction is triggered when the address of stxa is the 
beginning of cache line for AllocatePrefetchStyle == 3. If it is just 8 bytes aligned it will be simple store.
We should require AllocatePrefetchStepSize to be 8 bytes aligned in vm_version_sparc.cpp instead of:
+       // BIS instructions require 8-byte aligned addresses
+       Node* mask = _igvn.MakeConX(~(intptr_t)(wordSize - 1));


>
>>
>> These way we allow AllocatePrefetchDistance == 0 in all AllocatePrefetchStyle cases - it is consistent.
>
> Unfortunately, it is not easy to have the same limit for AllocatePrefetchDistance on all platforms. Due to the 8-byte
> alignment performed by compiled code, the lower limit of 8 for AllocatePrefetchDistance is needed on SPARC; the lower
> limit of 0 works fine on all other platforms.
>
> I've started looking at the consistency of flags controlling allocation prefetch in general, as we have other issues
> open related to them (e.g., JDK-8151622). We're touching related code now, so I thought, maybe it makes sense to fix all
> remaining issues at once.

Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - it should be used only if UseTLAB is true. Please, look.
And I think Abstract_VM_Version::_reserve_for_allocation_prefetch should be set for all styles on all platforms to avoid 
accessing beyond heap. Prefetch instructions doc say that they does not trap but we should be careful.

>
> The updated webrev does the following (in addition to fixing the original problem with AllocatePrefetchDistance):
>
> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 (i.e., BIS instructions are used for prefetching). As
> far as I understand, AllocatePrefetchStyle = 3 was added to support prefetching with BIS, so if BIS is enabled, we
> should use AllocatePrefetchStyle = 3.

Correct - if BIS (AllocatePrefetchInstr = 1) is used we should select AllocatePrefetchStyle = 3.
But we should allow AllocatePrefetchStyle = 3  if normal prefetch instructions (or other platforms) are used.
I think we should update comment in globals.hpp to say "generate one prefetch per cache line" without saying BIS.

But I agree if BIS is not available we should not use BIS AllocatePrefetchInstr = 1.

>
> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on non-SPARC platforms.

It could be useful on other platforms since it does one access per cache line.

>
> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if AllocatePrefetchStyle is 3 (due to alignment requirements).

That is correct since stxa requires at least 8 bytes alignment (as stx).

>
> 3. Determine the number of lines to prefetch in the same way for all prefetch styles:
> lines = (prefecth instance allocation) ? AllocateInstancePrefetchLines : AllocatePrefetchLines

Agree.

>
> Here is the updated webrev:
> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/

vm_version_sparc.cpp
AllocatePrefetchInstr = 0 should be set for all styles (not only 1) when BIS is not available.

Thanks,
Vladimir

>
> Testing:
> - JPRT (incl. TestOptionsWithRanges)
> - local testing on a SPARC machine.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
>>> Hi,
>>>
>>>
>>> please review the patch for 8153340.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8153340
>>>
>>>
>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way
>>> the address for the first prefetch instruction is calculated [1]:
>>>
>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of
>>> the bits of cache_adr. That result in accesses *before* the newly allocated object.
>>>
>>>
>>> Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3).
>>> Unquarantine test.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>>>
>>> Testing:
>>> - JPRT (incl. TestOptionsWithRanges.java)
>>> - local testing on a SPARC machine.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941
>

From vladimir.kozlov at oracle.com  Wed Apr 27 01:03:53 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 26 Apr 2016 18:03:53 -0700
Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64
In-Reply-To: <571E1F95.9040309@redhat.com>
References: <571A1B48.8040304@redhat.com>
	<ae3bcef8-601e-16f9-5d6e-2669b1d366ce@oracle.com>
	<571E1F95.9040309@redhat.com>
Message-ID: <57200FF9.5000100@oracle.com>

The fix is general and helps other platforms. I am waiting results of testing (this weekend and yesterday it was interrupted quarterly maintenance). I will push it when I get results.

Thanks,
Vladimir

On 4/25/16 6:45 AM, Roland Westrelin wrote:
> Thank Vladimir and Michael for reviewing this.
>
> FWIW, I'll investigate whether it makes sense for aarch64 to use unroll
> analysis. If it does, then that fix isn't required. It might be nice to
> have anyway. What do you think?
>
> Roland.
>

From vladimir.kozlov at oracle.com  Wed Apr 27 01:47:12 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 26 Apr 2016 18:47:12 -0700
Subject: RFR(S): 8154836: VM crash due to "Base pointers must match"
In-Reply-To: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>
References: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>
Message-ID: <57201A20.7000504@oracle.com>

Thank you, Martin

On 4/25/16 4:04 AM, Doerr, Martin wrote:
> Hi all,
>
> we have already seen such an assertion in final_graph_reshaping.
>
> We found 2 AddP nodes in a row. The ideal graph looked like this (simplified):
> N0 ConP
> N1 ConN
> N2 AddP(Base = N0, Address = N0)
> N3 AddP(Base = N0, Address = N2)
>
> Final graph reshaping visited N2 before N3 first and changed the graph:
> N0 ConP
> N1 ConN
> N4 DecodeN
> N2 AddP(Base = N4, Address = N4)
> N3 AddP(Base = N0, Address = N2)
>
> Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected.
>
> I made a change to reconnect N3's Base input to N4, too.
>
> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/

Fix looks good.

>
> In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row.
> I wouldn't expect that to happen. Not sure if this assertion is desired.

We should not have 3rd AddP but I agree with assert. You should add additional check to the assert:

|| out_j->in(AddPNode::Base) != addp

>
> I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive.
> Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode.
> (I believe X86 is the only platform which can match the decoding in the operand in this case.)

It is not related to the fix and should be done separately. Or don't do at all.
There is comment above the code which explains why it could beneficial on SPARC too:

2845       // On sparc loading 32-bits constant and decoding it have less
2846       // instructions (4) then load 64-bits constant (7).

Thanks,
Vladimir

>
> Please review. I will also need a sponsor, please.
>
> Best regards,
> Martin
>
>

From vladimir.kozlov at oracle.com  Wed Apr 27 05:02:55 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 26 Apr 2016 22:02:55 -0700
Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A893B5@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com>
	<571AD5BC.10309@oracle.com>
	<C568518E7B433348B114B6A7122D474756E644EF@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E646B3@FMSMSX102.amr.corp.intel.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A893B5@ORSMSX106.amr.corp.intel.com>
Message-ID: <572047FF.8000903@oracle.com>

Looks good to me. I will start testing.

Thanks,
Vladimir

On 4/26/16 11:22 AM, Deshpande, Vivek R wrote:
> HI Vladimir
>
> I have updated the webrev with all suggested changes.
> The webrev is at this location:
> http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.01/
>
> Regards
> Vivek
>
> -----Original Message-----
> From: Berg, Michael C
> Sent: Saturday, April 23, 2016 7:24 PM
> To: Vladimir Kozlov; Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
>
> More info:
>
> Setvectmask and restorevectmask are used for auto code generation.
>
> The method attached here is used entirely for stub code assembly and is intended only for stub code usage and checked via assertions as such.
>
> Regards,
> Michael
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
> Sent: Friday, April 22, 2016 7:30 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
>
> Vladimir, it Is not related to 8153998.
> It seems his patch is not sync'd against the head of the tree.
>
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, April 22, 2016 6:54 PM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Berg, Michael C <michael.c.berg at intel.com>
> Subject: Re: RFR (S): 8154975: Update for vectorizedMismatch with AVX512
>
> Hi Vivek,
>
> How it is related to next macro instructions from JDK-8153998?
>
> +  // special instructions for EVEX
> +  void setvectmask(Register dst, Register src);  void
> + restorevectmask();
>
> Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code.
>
> _programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking
>
>
> I don't like next code in assembler instructions:
> +  if (zeroing) attributes.set_is_clear_context();
>
> +  if (!no_reg_mask) {
> +    attributes.set_embedded_opmask_register_specifier(mask);
> +    if (zeroing) attributes.set_is_clear_context();
> +  }
>
> zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them.
> _embedded_opmask_register_specifier is not used (only set). Don't add values which are not used.
>
> Thanks,
> Vladimir
>
> On 4/22/16 6:10 PM, Deshpande, Vivek R wrote:
>> Hi all
>>
>> I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic.
>>
>> Could you please review and sponsor this patch.
>>
>> Bug-id:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8154975
>>
>> webrev:
>>
>> http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/
>>
>> Thanks and regards,
>>
>> Vivek
>>

From vivek.r.deshpande at intel.com  Wed Apr 27 06:53:21 2016
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Wed, 27 Apr 2016 06:53:21 +0000
Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub
	generation and intrinsics
In-Reply-To: <74581C88-CC26-441E-933B-73954C56F077@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com>
	<57162A88.7030608@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com>
	<5717D7E8.5000108@oracle.com>
	<53E8E64DB2403849AFD89B7D4DAC8B2A56A84884@ORSMSX106.amr.corp.intel.com>
	<74581C88-CC26-441E-933B-73954C56F077@oracle.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A89A1C@ORSMSX106.amr.corp.intel.com>

Hi Christian

I have updated the webrev and link for the same is here:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.03/
I am using mathfunc() to call the masm version of call_VM_leaf_base() and not the InterpreterMacroAssembler version.

Regards,
Vivek


From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Thursday, April 21, 2016 2:35 PM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>
Cc: Nils Eliasson <nils.eliasson at oracle.com>; hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov <vladimir.kozlov at oracle.com>
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics


On Apr 20, 2016, at 2:13 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com<mailto:vivek.r.deshpande at intel.com>> wrote:

Hi

The correct URL for the updated webrev is this.
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/


+void MacroAssembler::mathfunc(address runtime_entry) {
I don?t like the name of this method.  Mainly because it?s only aligning the stack (shouldn?t that happen somewhere else?) and doing this 0x20 stack frame thing which I still don?t understand.

Right, this is the one I was thinking about:

void MacroAssembler::call_VM_leaf_base(address entry_point, int num_args) {


Sorry for the spam.

Regards,
Vivek

From: Deshpande, Vivek R
Sent: Wednesday, April 20, 2016 5:10 PM
To: Deshpande, Vivek R; 'Nils Eliasson'; 'hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>'
Cc: 'Vladimir Kozlov'; 'Volker Simonis'; 'Christian Thalinger'; Viswanathan, Sandhya
Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Sent out the wrong link by mistake.

updated webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/<http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/>

Regards
Vivek


From: Deshpande, Vivek R
Sent: Wednesday, April 20, 2016 5:07 PM
To: 'Nils Eliasson'; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Nils

I have updated the webrev with all the suggestions.
updated webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/
Thanks for your comments and review.

@Vladimir,
I have taken care of all the comments. Would you please review and sponsor the patch.

Thanks and regards,
Vivek

From: Nils Eliasson [mailto:nils.eliasson at oracle.com]
Sent: Wednesday, April 20, 2016 12:27 PM
To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

In vmSymbols.cpp together with the other flag checks.

Regards,
Nils
On 2016-04-20 02:44, Deshpande, Vivek R wrote:
HI Nils

Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static.
Could you point me the right location for the function ?
Also I have updated the webrev with rest of the comments here:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/

Regards,
Vivek

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net]On Behalf Of Nils Eliasson
Sent: Tuesday, April 19, 2016 5:55 AM
To: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics

Hi Vivek,

The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives.

Regards,
Nils
On 2016-04-18 19:38, Deshpande, Vivek R wrote:
Hi all

I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation.
This uses -XX:DisableIntrinsic option to achieve the same.
Could you please review and sponsor this patch.

Bug-id:
https://bugs.openjdk.java.net/browse/JDK-8154473
webrev:
http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/<http://cr.openjdk.java.net/%7Evdeshpande/CompilerDirectives/8154473/webrev.00/>

Thanks and regards,
Vivek


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/9daa6618/attachment-0001.html>

From vladimir.kozlov at oracle.com  Wed Apr 27 07:48:24 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 27 Apr 2016 00:48:24 -0700
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <571F500E.5020904@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com>
	<571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com>
	<DC5053CC-CE39-4FA1-BC7F-8C74EE40D761@oracle.com>
	<571F500E.5020904@oracle.com>
Message-ID: <499d9f6b-378f-e350-eb72-c35930360825@oracle.com>

I think next should be greaterEqual (branch from loop when limit+8 == 0) to avoid reading 8 bytes beyond array.

+  // Bail out if we reached the end (but still do the comparison)
+  br(Assembler::positive, false, Assembler::pn, Lremaining);

An other trick you can is use xorcc instead of cmp.
Then you need only one srlx and compare it with zero (may be use cmp_zero_and_br()).

Thanks,
Vladimir

On 4/26/16 4:25 AM, Tobias Hartmann wrote:
> Thanks, Chris!
>
> On 25.04.2016 21:29, Christian Thalinger wrote:
>>
>>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>
>>> Hi Mikael,
>>>
>>> On 25.04.2016 09:59, Mikael Gerdin wrote:
>>>> Hi Tobias
>>>>
>>>> On 2016-04-25 08:38, Tobias Hartmann wrote:
>>>>> Hi,
>>>>>
>>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).
>>>>
>>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned.
>>>>
>>>> With compressed oops enabled the mark word + compressed class + alength
>>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned.
>>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case.
>>>
>>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned.
>>
>> See arrayOop.hpp:
>>
>>   // Returns the offset of the first element.
>>   static int base_offset_in_bytes(BasicType type) {
>>     return header_size(type) * HeapWordSize;
>>   }
>
> I verified that Mikael's claim is right:
> With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4).
> Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4).
>
> However, the header size is always aligned to HeapWordSize:
>
> static int header_size_in_bytes() {
>   size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize);
>
> and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned.
>
> Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance:
> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/
>
> Thanks,
> Tobias
>
>>>
>>> Best regards,
>>> Tobias
>>>
>>>>
>>>> /Mikael
>>>>
>>>>>
>>>>> If there are no objections, I would like to push the basic version:
>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>
>>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic.
>>>>>
>>>>> Thanks,
>>>>> Tobias
>>>>>
>>>>> On 20.04.2016 15:31, Tobias Hartmann wrote:
>>>>>> Hi John,
>>>>>>
>>>>>> On 20.04.2016 03:46, John Rose wrote:
>>>>>>> So I started looking at your code and my inner SPARC junkie took over.
>>>>>>>
>>>>>>> This is what happened:
>>>>>>>   http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
>>>>>>
>>>>>> Thanks a lot for having a look!
>>>>>>
>>>>>>> Perhaps there are some ideas that might be helpful:
>>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
>>>>>>
>>>>>> Right, this simplifies the code a bit:
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
>>>>>>
>>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
>>>>>>
>>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
>>>>>>
>>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
>>>>>>
>>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
>>>>>>
>>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
>>>>>>
>>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> Thanks,
>>>>>> Tobias
>>>>>>
>>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
>>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>> [3] Runtime alignment checks:
>>>>>>
>>>>>> bind(Lunaligned);
>>>>>> Label next;
>>>>>> xor3(ary1, ary2, tmp);
>>>>>> and3(tmp, 7, tmp);
>>>>>> br_null_short(tmp, Assembler::pn, next);
>>>>>> STOP("One array is unaligned!");
>>>>>> should_not_reach_here();
>>>>>> bind(next);
>>>>>> STOP("Both arrays are unaligned!");
>>>>>>
>>>>>>> On the other hand, what you wrote is nice and simple.
>>>>>>>
>>>>>>> HTH
>>>>>>> ? John
>>>>>>>
>>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>>>>>>> But that's neither nice nor simple.
>>>>>>>
>>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>>>>
>>>>>>>> Thanks, Vladimir!
>>>>>>>>
>>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>>>>>>
>>>>>>>> Okay, I'll push the basic version.
>>>>>>>>
>>>>>>>> For reference, here are the results on a SPARC T4:
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>>>>>>
>>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>>>>>>> We do have arraycopy code for it but by default we don't use it:
>>>>>>>>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>>>>>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>>>>>>
>>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>>>>>>
>>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>>>>>>
>>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0
>>>>>>>> Error: Could not create the Java Virtual Machine.
>>>>>>>> Error: A fatal exception has occurred. Program will exit
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> please review the following enhancement:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>>>>>>
>>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>>>>>>
>>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>>>>>>
>>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>>>>>>
>>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>>>>>>
>>>>>>>>>> I evaluated the following three versions of the patch.
>>>>>>>>>>
>>>>>>>>>> -- Basic --
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>>
>>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>>>>>>
>>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>>> Version "small" tries to improve this.
>>>>>>>>>>
>>>>>>>>>> -- Prefetching --
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>>
>>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>>>>>>
>>>>>>>>>> -- Small --
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>>>>>>
>>>>>>>>>> The numbers can be found here:
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>>>>>>
>>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>>
>>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Tobias
>>>>>>>>>>
>>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>>>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>

From rwestrel at redhat.com  Wed Apr 27 08:00:11 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 27 Apr 2016 10:00:11 +0200
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
Message-ID: <5720718B.6020102@redhat.com>

Hi Martin,

> seems like this issue is related to what I have sent out today:
> RFR(S): 8154836: VM crash due to "Base pointers must match"
> I also had to change the AddP case of final graph reshaping.
> 
> In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode.
> 
> Maybe I have to remove that part of my change, or at least adapt it.
> We should make sure that the changes don't get pushed on the same day.

Thanks for the heads up. It looks like your change will get in before
mine. I'll send an updated webrev once it's integrated.

Roland.

From stefan.karlsson at oracle.com  Wed Apr 27 08:19:38 2016
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 27 Apr 2016 10:19:38 +0200
Subject: RFR: 8155206: Internal VM test DirectiveParser_test is too verbose
Message-ID: <5720761A.4010004@oracle.com>

Hi all,

Please review this patch to silence the DirectiveParser_test internal VM 
test.

http://cr.openjdk.java.net/~stefank/8155206/webrev.01
https://bugs.openjdk.java.net/browse/JDK-8155206

Before the patch, we got the following output when running with 
-XX:+ExecuteInternalVMTests:

...
Running test: Test_log_file_startup_truncation
Running test: Test_invalid_log_file
Running test: DirectivesParser_test
Internal error on line 1 byte 2: Directive missing required match.
   Got EOS.
   At '}'.
{}
Internal error on line 1 byte 3: Directive missing required match.
   At '}]'.
[{}]
Internal error on line 1 byte 3: Directive missing required match.
   At '},{}]'.
[{},{}]
Internal error on line 1 byte 2: Directive missing required match.
   At '},{}'.
{},{}
Syntax error on line 2 byte 3: DirectivesParser can only start with an 
array containing directive objects, or one single directive.
   At '['.
   [
     {
       match: "foo/bar.*",
       inline : "+java/util.*",
       PrintAssembly: true,
       BreakAtExecute: true,
     }
   ]
]

Value error on line 4 byte 20: The key 'PrintInlining' does not allow an 
array of values.
   At '['.
     PrintInlining: [
       true,
       false
     ],
   }
]

Warning: +LogCompilation must be set to enable compilation logging from 
directives
Warning: +LogCompilation must be set to enable compilation logging from 
directives
Value error on line 7 byte 9: Method pattern error: Missing leading 
inline type (+/-)
   At '"foo",'.
         "foo",
         "bar",
       ]
     }
   }
]

Value error on line 8 byte 9: Method pattern error: Missing leading 
inline type (+/-)
   At '"bar",'.
         "bar",
       ]
     }
   }
]

Key error on line 1 byte 7: Key 'c1' not allowed after 'c1' key.
   At 'c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]'.
[{c1:{c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]
Value error on line 5 byte 12: Key of type match needs a value of type 
string
   At 'true,'.
     match: true,
     inline: true,
     enable: true,
     c1: {
       preset: true,
     }
   }
]

Running test: Test_TempNewSymbol
Running test: VMStructs_test
...

With the patch the output is much less noisy:
...
Running test: Test_log_file_startup_truncation
Running test: Test_invalid_log_file
Running test: DirectivesParser_test
Warning:  +LogCompilation must be set to enable compilation logging from 
directives
Warning:  +LogCompilation must be set to enable compilation logging from 
directives
Running test: Test_TempNewSymbol
Running test: VMStructs_test
...

We might want to get rid of the Warning messages, but I think the 
proposed patch is a good first step.

You can turn on the old output with -XX:+VerboseInternalVMTests.

Tested with -XX:+ExecuteInternalVMTests :)

Thanks,
StefanK

From zoltan.majo at oracle.com  Wed Apr 27 08:31:07 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 27 Apr 2016 10:31:07 +0200
Subject: RFR(S): 8154836: VM crash due to "Base pointers must match"
In-Reply-To: <57201A20.7000504@oracle.com>
References: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>
	<57201A20.7000504@oracle.com>
Message-ID: <572078CB.9010201@oracle.com>

Hi Martin,


thank you for fixing this problem! The fix looks good to me as well, I 
have only one minor comment (please see inline).

On 04/27/2016 03:47 AM, Vladimir Kozlov wrote:
> [...]
>
>>
>> I made an additional change: I think the graph transformation doesn't 
>> make sense if decoding is expensive.
>> Therefore, I skip it on non-X86 platforms when we're running in heap 
>> based compressed oops mode.
>> (I believe X86 is the only platform which can match the decoding in 
>> the operand in this case.)
>
> It is not related to the fix and should be done separately. Or don't 
> do at all.
> There is comment above the code which explains why it could beneficial 
> on SPARC too:
>
> 2845       // On sparc loading 32-bits constant and decoding it have less
> 2846       // instructions (4) then load 64-bits constant (7).

If you decide to fix this, the condition

2856 if ((op == Op_ConN && Universe::narrow_oop_shift() != 0) ||
2857 (op == Op_ConNKlass && Universe::narrow_klass_shift() != 0)) {

seems better than

2856         if ((op == Op_ConN      && Universe::narrow_oop_base()   != NULL) ||
2857             (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) {

for two reasons:- on non-x86 platforms matching is guarded by 'narrow.*shift' methods - 
if the heap is between 4GB and 32GB, 'narrow.*base' is NULL and 
'narrow.*shift' != 0 therefore the instructions are not emitted.


Please correct me if I'm wrong.

> [...]
>> Please review. I will also need a sponsor, please.

I'd be happy to sponsor the change.

Thank you!

Best regards,


Zoltan

>>
>> Best regards,
>> Martin
>>
>>


From martin.doerr at sap.com  Wed Apr 27 08:40:39 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 27 Apr 2016 08:40:39 +0000
Subject: RFR(S): 8154836: VM crash due to "Base pointers must match"
In-Reply-To: <57201A20.7000504@oracle.com>
References: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>
	<57201A20.7000504@oracle.com>
Message-ID: <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap>

Hi Vladimir and Zoltan,

thanks for reviewing.

I have made the requested changes:
- Removed the code which skips the transformation on non-X86 platforms in heap-based compressed oops mode. I think we can discuss that independently.
- Improved the assertion as suggested by Vladimir.

New webrev is here:
http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.01/

Zoltan, you have already offered to sponsor this change. Thanks.

Best regards,
Martin


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Mittwoch, 27. April 2016 03:47
To: Doerr, Martin <martin.doerr at sap.com>; Zolt?n Maj? <zoltan.majo at oracle.com>; hotspot-compiler-dev at openjdk.java.net compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match"

Thank you, Martin

On 4/25/16 4:04 AM, Doerr, Martin wrote:
> Hi all,
>
> we have already seen such an assertion in final_graph_reshaping.
>
> We found 2 AddP nodes in a row. The ideal graph looked like this (simplified):
> N0 ConP
> N1 ConN
> N2 AddP(Base = N0, Address = N0)
> N3 AddP(Base = N0, Address = N2)
>
> Final graph reshaping visited N2 before N3 first and changed the graph:
> N0 ConP
> N1 ConN
> N4 DecodeN
> N2 AddP(Base = N4, Address = N4)
> N3 AddP(Base = N0, Address = N2)
>
> Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected.
>
> I made a change to reconnect N3's Base input to N4, too.
>
> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/

Fix looks good.

>
> In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row.
> I wouldn't expect that to happen. Not sure if this assertion is desired.

We should not have 3rd AddP but I agree with assert. You should add additional check to the assert:

|| out_j->in(AddPNode::Base) != addp

>
> I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive.
> Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode.
> (I believe X86 is the only platform which can match the decoding in the operand in this case.)

It is not related to the fix and should be done separately. Or don't do at all.
There is comment above the code which explains why it could beneficial on SPARC too:

2845       // On sparc loading 32-bits constant and decoding it have less
2846       // instructions (4) then load 64-bits constant (7).

Thanks,
Vladimir

>
> Please review. I will also need a sponsor, please.
>
> Best regards,
> Martin
>
>

From zoltan.majo at oracle.com  Wed Apr 27 08:44:07 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 27 Apr 2016 10:44:07 +0200
Subject: RFR(S): 8154836: VM crash due to "Base pointers must match"
In-Reply-To: <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap>
References: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>
	<57201A20.7000504@oracle.com>
	<1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap>
Message-ID: <57207BD7.9070701@oracle.com>

Hi Martin,


On 04/27/2016 10:40 AM, Doerr, Martin wrote:
> Hi Vladimir and Zoltan,
>
> thanks for reviewing.
>
> I have made the requested changes:
> - Removed the code which skips the transformation on non-X86 platforms in heap-based compressed oops mode. I think we can discuss that independently.
> - Improved the assertion as suggested by Vladimir.
>
> New webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.01/

looks good to me!

> Zoltan, you have already offered to sponsor this change. Thanks.

Sure. I can push the change once Vladimir has also agreed to it.

Best regards,


Zoltan

>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Mittwoch, 27. April 2016 03:47
> To: Doerr, Martin <martin.doerr at sap.com>; Zolt?n Maj? <zoltan.majo at oracle.com>; hotspot-compiler-dev at openjdk.java.net compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match"
>
> Thank you, Martin
>
> On 4/25/16 4:04 AM, Doerr, Martin wrote:
>> Hi all,
>>
>> we have already seen such an assertion in final_graph_reshaping.
>>
>> We found 2 AddP nodes in a row. The ideal graph looked like this (simplified):
>> N0 ConP
>> N1 ConN
>> N2 AddP(Base = N0, Address = N0)
>> N3 AddP(Base = N0, Address = N2)
>>
>> Final graph reshaping visited N2 before N3 first and changed the graph:
>> N0 ConP
>> N1 ConN
>> N4 DecodeN
>> N2 AddP(Base = N4, Address = N4)
>> N3 AddP(Base = N0, Address = N2)
>>
>> Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected.
>>
>> I made a change to reconnect N3's Base input to N4, too.
>>
>> Webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/
> Fix looks good.
>
>> In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row.
>> I wouldn't expect that to happen. Not sure if this assertion is desired.
> We should not have 3rd AddP but I agree with assert. You should add additional check to the assert:
>
> || out_j->in(AddPNode::Base) != addp
>
>> I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive.
>> Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode.
>> (I believe X86 is the only platform which can match the decoding in the operand in this case.)
> It is not related to the fix and should be done separately. Or don't do at all.
> There is comment above the code which explains why it could beneficial on SPARC too:
>
> 2845       // On sparc loading 32-bits constant and decoding it have less
> 2846       // instructions (4) then load 64-bits constant (7).
>
> Thanks,
> Vladimir
>
>> Please review. I will also need a sponsor, please.
>>
>> Best regards,
>> Martin
>>
>>


From tobias.hartmann at oracle.com  Wed Apr 27 08:45:42 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 27 Apr 2016 10:45:42 +0200
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <499d9f6b-378f-e350-eb72-c35930360825@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com>
	<571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com>
	<DC5053CC-CE39-4FA1-BC7F-8C74EE40D761@oracle.com>
	<571F500E.5020904@oracle.com>
	<499d9f6b-378f-e350-eb72-c35930360825@oracle.com>
Message-ID: <57207C36.3030101@oracle.com>

Hi Vladimir,

On 27.04.2016 09:48, Vladimir Kozlov wrote:
> I think next should be greaterEqual (branch from loop when limit+8 == 0) to avoid reading 8 bytes beyond array.
> 
> +  // Bail out if we reached the end (but still do the comparison)
> +  br(Assembler::positive, false, Assembler::pn, Lremaining);

According to the SPARC manual [1], bpos branches if "not N", i.e. if limit is >= 0. I verified this with a small program (attached) using inline function templates [2]:

$ cc -O main.c code.il
$ ./a.out
 1: 1
 0: 1
-1: 0

That means the bpos branch is taken for 0 and 1 but not for -1. So the code should be correct.

> An other trick you can is use xorcc instead of cmp.
> Then you need only one srlx and compare it with zero (may be use cmp_zero_and_br()).

Yes, John already suggested this but as I wrote in another email, it does not improve but slightly degrade performance.

Thanks,
Tobias

[1] http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/140521-ua2011-d096-p-ext-2306580.pdf
[2] https://docs.oracle.com/cd/E26502_01/html/E28387/gmabr.html

> 
> Thanks,
> Vladimir
> 
> On 4/26/16 4:25 AM, Tobias Hartmann wrote:
>> Thanks, Chris!
>>
>> On 25.04.2016 21:29, Christian Thalinger wrote:
>>>
>>>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>
>>>> Hi Mikael,
>>>>
>>>> On 25.04.2016 09:59, Mikael Gerdin wrote:
>>>>> Hi Tobias
>>>>>
>>>>> On 2016-04-25 08:38, Tobias Hartmann wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).
>>>>>
>>>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned.
>>>>>
>>>>> With compressed oops enabled the mark word + compressed class + alength
>>>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned.
>>>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case.
>>>>
>>>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned.
>>>
>>> See arrayOop.hpp:
>>>
>>>   // Returns the offset of the first element.
>>>   static int base_offset_in_bytes(BasicType type) {
>>>     return header_size(type) * HeapWordSize;
>>>   }
>>
>> I verified that Mikael's claim is right:
>> With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4).
>> Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4).
>>
>> However, the header size is always aligned to HeapWordSize:
>>
>> static int header_size_in_bytes() {
>>   size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize);
>>
>> and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned.
>>
>> Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance:
>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/
>>
>> Thanks,
>> Tobias
>>
>>>>
>>>> Best regards,
>>>> Tobias
>>>>
>>>>>
>>>>> /Mikael
>>>>>
>>>>>>
>>>>>> If there are no objections, I would like to push the basic version:
>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>
>>>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic.
>>>>>>
>>>>>> Thanks,
>>>>>> Tobias
>>>>>>
>>>>>> On 20.04.2016 15:31, Tobias Hartmann wrote:
>>>>>>> Hi John,
>>>>>>>
>>>>>>> On 20.04.2016 03:46, John Rose wrote:
>>>>>>>> So I started looking at your code and my inner SPARC junkie took over.
>>>>>>>>
>>>>>>>> This is what happened:
>>>>>>>>   http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
>>>>>>>
>>>>>>> Thanks a lot for having a look!
>>>>>>>
>>>>>>>> Perhaps there are some ideas that might be helpful:
>>>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
>>>>>>>
>>>>>>> Right, this simplifies the code a bit:
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
>>>>>>>
>>>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
>>>>>>>
>>>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
>>>>>>>
>>>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
>>>>>>>
>>>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
>>>>>>>
>>>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
>>>>>>>
>>>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tobias
>>>>>>>
>>>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
>>>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>> [3] Runtime alignment checks:
>>>>>>>
>>>>>>> bind(Lunaligned);
>>>>>>> Label next;
>>>>>>> xor3(ary1, ary2, tmp);
>>>>>>> and3(tmp, 7, tmp);
>>>>>>> br_null_short(tmp, Assembler::pn, next);
>>>>>>> STOP("One array is unaligned!");
>>>>>>> should_not_reach_here();
>>>>>>> bind(next);
>>>>>>> STOP("Both arrays are unaligned!");
>>>>>>>
>>>>>>>> On the other hand, what you wrote is nice and simple.
>>>>>>>>
>>>>>>>> HTH
>>>>>>>> ? John
>>>>>>>>
>>>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>>>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>>>>>>>> But that's neither nice nor simple.
>>>>>>>>
>>>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks, Vladimir!
>>>>>>>>>
>>>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>>>>>>>
>>>>>>>>> Okay, I'll push the basic version.
>>>>>>>>>
>>>>>>>>> For reference, here are the results on a SPARC T4:
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>>>>>>>
>>>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>>>>>>>> We do have arraycopy code for it but by default we don't use it:
>>>>>>>>>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>>>>>>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>>>>>>>
>>>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>>>>>>>
>>>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>>>>>>>
>>>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0
>>>>>>>>> Error: Could not create the Java Virtual Machine.
>>>>>>>>> Error: A fatal exception has occurred. Program will exit
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Tobias
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> please review the following enhancement:
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>>>>>>>
>>>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>>>>>>>
>>>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>>>>>>>
>>>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>>>>>>>
>>>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>>>>>>>
>>>>>>>>>>> I evaluated the following three versions of the patch.
>>>>>>>>>>>
>>>>>>>>>>> -- Basic --
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>>>
>>>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>>>>>>>
>>>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>>>> Version "small" tries to improve this.
>>>>>>>>>>>
>>>>>>>>>>> -- Prefetching --
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>>>
>>>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>>>>>>>
>>>>>>>>>>> -- Small --
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>>>>>>>
>>>>>>>>>>> The numbers can be found here:
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>>>>>>>
>>>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Tobias
>>>>>>>>>>>
>>>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>>>>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: code.c
Type: text/x-csrc
Size: 127 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/bf628acb/code.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.c
Type: text/x-csrc
Size: 143 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/bf628acb/main.c>

From martin.doerr at sap.com  Wed Apr 27 08:46:46 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 27 Apr 2016 08:46:46 +0000
Subject: RFR(S): 8154836: VM crash due to "Base pointers must match"
In-Reply-To: <572078CB.9010201@oracle.com>
References: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>
	<57201A20.7000504@oracle.com> <572078CB.9010201@oracle.com>
Message-ID: <3b9e4a9d845a4b83859c61cd6459ca24@DEWDFE13DE14.global.corp.sap>

Hi Zoltan,

I will comment on the skipping the transformation on non-x86 platforms in 8154826 and include Roland who is changing final graph reshaping of the AddP nodes.

Best regards,
Martin


-----Original Message-----
From: Zolt?n Maj? [mailto:zoltan.majo at oracle.com] 
Sent: Mittwoch, 27. April 2016 10:31
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match"

Hi Martin,


thank you for fixing this problem! The fix looks good to me as well, I 
have only one minor comment (please see inline).

On 04/27/2016 03:47 AM, Vladimir Kozlov wrote:
> [...]
>
>>
>> I made an additional change: I think the graph transformation doesn't 
>> make sense if decoding is expensive.
>> Therefore, I skip it on non-X86 platforms when we're running in heap 
>> based compressed oops mode.
>> (I believe X86 is the only platform which can match the decoding in 
>> the operand in this case.)
>
> It is not related to the fix and should be done separately. Or don't 
> do at all.
> There is comment above the code which explains why it could beneficial 
> on SPARC too:
>
> 2845       // On sparc loading 32-bits constant and decoding it have less
> 2846       // instructions (4) then load 64-bits constant (7).

If you decide to fix this, the condition

2856 if ((op == Op_ConN && Universe::narrow_oop_shift() != 0) ||
2857 (op == Op_ConNKlass && Universe::narrow_klass_shift() != 0)) {

seems better than

2856         if ((op == Op_ConN      && Universe::narrow_oop_base()   != NULL) ||
2857             (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) {

for two reasons:- on non-x86 platforms matching is guarded by 'narrow.*shift' methods - 
if the heap is between 4GB and 32GB, 'narrow.*base' is NULL and 
'narrow.*shift' != 0 therefore the instructions are not emitted.


Please correct me if I'm wrong.

> [...]
>> Please review. I will also need a sponsor, please.

I'd be happy to sponsor the change.

Thank you!

Best regards,


Zoltan

>>
>> Best regards,
>> Martin
>>
>>


From martin.doerr at sap.com  Wed Apr 27 09:11:28 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 27 Apr 2016 09:11:28 +0000
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <5720718B.6020102@redhat.com>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
	<5720718B.6020102@redhat.com>
Message-ID: <ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>

Hi Roland,

I have removed the piece of my change which would interfere with your change.

But I'd like to explain the intention of skipping the transformation on non-x86 platforms in heap-based compressed oops mode.
(In simpler compressed oops modes decoding is very cheap so the transformation is probably good.)
Without transformation we have: LoadConP + Storage access
With transformation we have: LoadConN + DecodeN heap-based + Storage access

I believe X86 is the only platform which can match the DecodeN heap-based into the Storage access.

I guess other platforms should prefer the untransformed version:
PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded.

I didn't take a closer look at SPARC, but I thought it would use the constant pool as well. Not sure if the following comment is still correct:
// On sparc loading 32-bits constant and decoding it have less
// instructions (4) then load 64-bits constant (7).


Therefore, I had proposed the following code to skip:
+         // Matching decode heap based into an operand only works on X86.
+ #if !defined(X86)
+         if ((op == Op_ConN      && Universe::narrow_oop_base()   != NULL) ||
+             (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) {
+           break;
+         }
+ #endif
+

Would this be good for aarch64 as well?
Would you like to include code which skips the transformation in your change or should this better be discussed independently?

Best regards,
Martin


-----Original Message-----
From: Roland Westrelin [mailto:rwestrel at redhat.com] 
Sent: Mittwoch, 27. April 2016 10:00
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode

Hi Martin,

> seems like this issue is related to what I have sent out today:
> RFR(S): 8154836: VM crash due to "Base pointers must match"
> I also had to change the AddP case of final graph reshaping.
> 
> In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode.
> 
> Maybe I have to remove that part of my change, or at least adapt it.
> We should make sure that the changes don't get pushed on the same day.

Thanks for the heads up. It looks like your change will get in before
mine. I'll send an updated webrev once it's integrated.

Roland.

From nassim.halli at gmail.com  Wed Apr 27 09:12:07 2016
From: nassim.halli at gmail.com (Nassim Halli)
Date: Wed, 27 Apr 2016 11:12:07 +0200
Subject: Are Java native methods inlined ?
Message-ID: <CAOS5t8KwKCVA3e=7X1n9_pu1qjdrUrwXcmtAYS8cNCphZvAckw@mail.gmail.com>

Hi guys,

Just a simple question : are Java native methods inlined ?

By "native methods" I mean the interface between the Java caller and the
targeted native JNI function in the dl (for which the JIT-compiled code can
be observed using *+PrintNativeNMethods)*.

I ask this question since I don't observe inlining with Java 8 and I heard
it was. Although I understand that it could not necessarily lead to great
improvements it's just for a clarification.

Thanks a lot !

Nassim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/f9dfde4e/attachment.html>

From rahul.v.raghavan at oracle.com  Wed Apr 27 09:45:18 2016
From: rahul.v.raghavan at oracle.com (Rahul Raghavan)
Date: Wed, 27 Apr 2016 02:45:18 -0700 (PDT)
Subject: RFR: 8153655: Make intrinsics flags diagnostic and update intrinsics
	tests to enable diagnostic options.
Message-ID: <b161dc52-6169-4e55-8ec2-9349d03b172b@default>

Hi,

Please review the following patch for JDK-8153655.

Bug: https://bugs.openjdk.java.net/browse/JDK-8153655
Webrev: http://cr.openjdk.java.net/~rraghavan/8153655/webrev.00/


Notes:

1. This 8153655/webrev.00 re-includes earlier backed out, same JDK-8145348 changes 
       (https://bugs.openjdk.java.net/browse/JDK-8145348 - Make intrinsics flags diagnostic)
and also additional fixes in failing intrinsic tests.


2. Checked all the usages of changed intrinsic flags in tests and 
found JDK-8153655 type test failure issue (after initial JDK-8145348 fix) is present only for following tests -
   a. UseAESIntrinsics test (compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java)
   b. UseSHA* tests (at compiler/intrinsics/sha/cli/)


3. Summary of 8153655/webrev.00 changes.

- Includes earlier backed out, same JDK-8145348 changes:
       src/share/vm/c1/c1_globals.hpp
       src/share/vm/opto/c2_globals.hpp
       src/share/vm/runtime/globals.hpp
       test/compiler/intrinsics/muladd/TestMulAdd.java
       test/compiler/runtime/6859338/Test6859338.java

- 'test/compiler/cpuflags/AESIntrinsicsBase.java'
      Options were passed in wrong order. 
      Changes done so that 'UnlockDiagnosticVMOptions' option precedes the diagnostic flags.

- 'test/compiler/intrinsics/sha/cli/*' - (UseSHA* tests)
     'UnlockDiagnosticVMOptions' option was not getting passed.
     Added support to precede intrinsic flag usages with explicit 'UnlockDiagnosticVMOptions'.


4. No issues found with testing done using product builds with proposed changes 
(hotspot/test/compiler/cpuflags/*, hotspot/test/compiler/intrinsics/*, hotspot/test/compiler/runtime/6859338/Test6859338.java)
Complete pre-integration testing using product builds is in progress.


Thanks,
Rahul

From aph at redhat.com  Wed Apr 27 10:07:28 2016
From: aph at redhat.com (Andrew Haley)
Date: Wed, 27 Apr 2016 11:07:28 +0100
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
	<5720718B.6020102@redhat.com>
	<ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
Message-ID: <57208F60.5000607@redhat.com>

On 27/04/16 10:11, Doerr, Martin wrote:
> Would this be good for aarch64 as well?

On AArch64, LoadConP is

    mov reg, #x
    movk reg, #y shl #16
    movk reg, #z shl 32

(3 cycles latency)

LoadConN + DecodeN heap-based

   mov reg, #x
   movk reg, #y shl #16
   add reg, heapbase, reg shl #3

(4 cycles latency)

Andrew.


From aph at redhat.com  Wed Apr 27 10:33:19 2016
From: aph at redhat.com (Andrew Haley)
Date: Wed, 27 Apr 2016 11:33:19 +0100
Subject: Are Java native methods inlined ?
In-Reply-To: <CAOS5t8KwKCVA3e=7X1n9_pu1qjdrUrwXcmtAYS8cNCphZvAckw@mail.gmail.com>
References: <CAOS5t8KwKCVA3e=7X1n9_pu1qjdrUrwXcmtAYS8cNCphZvAckw@mail.gmail.com>
Message-ID: <5720956F.1060808@redhat.com>

On 27/04/16 10:12, Nassim Halli wrote:

> Just a simple question : are Java native methods inlined ?

Not exactly.

> By "native methods" I mean the interface between the Java caller and
> the targeted native JNI function in the dl (for which the
> JIT-compiled code can be observed using *+PrintNativeNMethods)*.
> 
> I ask this question since I don't observe inlining with Java 8 and I
> heard it was. Although I understand that it could not necessarily
> lead to great improvements it's just for a clarification.

The glue code between Java and native code is not inlined into its
caller but it is generated by the JIT compiler.  Like every
optimization in HotSpot, this is only done for hot code.  If you run
something like Netbeans you'll see a lot of native nmethods generated.

Andrew.

From roland.schatz at oracle.com  Wed Apr 27 13:31:18 2016
From: roland.schatz at oracle.com (Roland Schatz)
Date: Wed, 27 Apr 2016 15:31:18 +0200
Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable
	array fields
In-Reply-To: <571A36D7.2030701@oracle.com>
References: <571A36D7.2030701@oracle.com>
Message-ID: <5720BF26.2050400@oracle.com>

Please don't integrate this yet. We're looking for a more general 
solution, it's possible this won't be needed in the final version.

Thanks,
Roland

On 04/22/2016 04:36 PM, Andreas Woess wrote:
> Please review:
> http://cr.openjdk.java.net/~aw/8154765/webrev/
> https://bugs.openjdk.java.net/browse/JDK-8154765
>
> This change adds an optional stableDimensions parameter to 
> ConstantReflectionProvider.readStableFieldValue that allows the number 
> of stable array dimensions to be specified more fine-granular than 
> inferring it from the type of the field.
>
> Thanks,
> Andreas
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/771e343d/attachment.html>

From zoltan.majo at oracle.com  Wed Apr 27 14:20:08 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 27 Apr 2016 16:20:08 +0200
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com>
References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com>
	<571F5756.7020007@oracle.com>
	<0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com>
Message-ID: <5720CA98.10605@oracle.com>

Hi Vladimir,


thank you for the feedback!

On 04/27/2016 12:39 AM, Vladimir Kozlov wrote:
> On 4/26/16 4:56 AM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> thank you for the feedback! Please see comments below.
>>
>> On 04/22/2016 12:37 AM, Vladimir Kozlov wrote:
>>> Hi, Zoltan
>>>
>>> I think we should change code in prefetch_allocation() instead:
>>>
>>>       Node *cache_adr = new AddPNode(old_eden_top, old_eden_top,
>>> _igvn.MakeConX(step_size + distance));
>>
>> The problem is that AllocatePrefetchStyle must align the first 
>> prefetched address to 8 bytes, otherwise the emitted stxa
>> instruction could cause a SIGBUS. But the alignment does not have to 
>> be at step_size boundary, 8-byte alignment is
>> sufficient.
>
> Actually it has to be step_size (cache line size) aligned - BIS 
> instruction is triggered when the address of stxa is the beginning of 
> cache line for AllocatePrefetchStyle == 3. If it is just 8 bytes 
> aligned it will be simple store.

Thank you for clarifying.

> We should require AllocatePrefetchStepSize to be 8 bytes aligned in 
> vm_version_sparc.cpp instead of:
> +       // BIS instructions require 8-byte aligned addresses
> +       Node* mask = _igvn.MakeConX(~(intptr_t)(wordSize - 1));

I agree. But I'd prefer we do that in the 
AllocatePrefetchStepSizeConstraintFunc() constraint function in 
commandLineFlagConstraintsCompiler.cpp). The reason is that since 
JEP-245 the preferred place to validate of command-line arguments is in 
constraint functions. Most compiler-related checks were moved there with 
JDK-8078554 and JDK-8146478.

I'd also like to set the minimum value for AllocatePrefetchDistance to 
AllocatePrefetchStepSize, otherwise we can access the heap before newly 
allocated objects/arrays.

>
>
>>
>>>
>>> These way we allow AllocatePrefetchDistance == 0 in all 
>>> AllocatePrefetchStyle cases - it is consistent.
>>
>> Unfortunately, it is not easy to have the same limit for 
>> AllocatePrefetchDistance on all platforms. Due to the 8-byte
>> alignment performed by compiled code, the lower limit of 8 for 
>> AllocatePrefetchDistance is needed on SPARC; the lower
>> limit of 0 works fine on all other platforms.
>>
>> I've started looking at the consistency of flags controlling 
>> allocation prefetch in general, as we have other issues
>> open related to them (e.g., JDK-8151622). We're touching related code 
>> now, so I thought, maybe it makes sense to fix all
>> remaining issues at once.
>
> Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - it 
> should be used only if UseTLAB is true. Please, look.

It seems AllocatePrefetchStyle = 2 can be used only if UseTLAB is true.
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/6a17c49de974/src/share/vm/opto/macro.cpp#l1844

Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, I've 
started an RBT run with all hotspot on all platforms using 
AllocatePrefetchStyle=2. So far no problems have shown up.

> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch 
> should be set for all styles on all platforms to avoid accessing 
> beyond heap. Prefetch instructions doc say that they does not trap but 
> we should be careful.

I agree.

That means we initialize _reserve_for_allocation_prefetch in a 
platform-independent way. So I think it would make sense to move that 
field to ThreadLocalAllocBuffer, as TLAB is the only user of that field 
and we don't support platform-independent initialization of 
Abstract_VM_Version. I did that in the updated webrev.

>>
>> The updated webrev does the following (in addition to fixing the 
>> original problem with AllocatePrefetchDistance):
>>
>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 
>> (i.e., BIS instructions are used for prefetching). As
>> far as I understand, AllocatePrefetchStyle = 3 was added to support 
>> prefetching with BIS, so if BIS is enabled, we
>> should use AllocatePrefetchStyle = 3.
>
> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should select 
> AllocatePrefetchStyle = 3.

OK.

> But we should allow AllocatePrefetchStyle = 3  if normal prefetch 
> instructions (or other platforms) are used.

OK, I've removed that restriction.

> I think we should update comment in globals.hpp to say "generate one 
> prefetch per cache line" without saying BIS.

OK.

>
> But I agree if BIS is not available we should not use BIS 
> AllocatePrefetchInstr = 1.

OK.

>
>>
>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on 
>> non-SPARC platforms.
>
> It could be useful on other platforms since it does one access per 
> cache line.

OK, let's keep it available then.

>
>
>>
>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if 
>> AllocatePrefetchStyle is 3 (due to alignment requirements).
>
> That is correct since stxa requires at least 8 bytes alignment (as stx).

OK.

>
>>
>> 3. Determine the number of lines to prefetch in the same way for all 
>> prefetch styles:
>> lines = (prefecth instance allocation) ? 
>> AllocateInstancePrefetchLines : AllocatePrefetchLines
>
> Agree.

OK.

>
>>
>> Here is the updated webrev:
>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/
>
> vm_version_sparc.cpp
> AllocatePrefetchInstr = 0 should be set for all styles (not only 1) 
> when BIS is not available.

OK. I think it is sufficient to do

52 if (!has_blk_init()) {
53 if (AllocatePrefetchInstr == 1) {
54 warning("BIS instructions required for AllocatePrefetchInstr 1 
unavailable");
55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0);
56 } I hope I'm not missing anything here. Here is the updated webrev: 
http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT 
(incl. TestOptionsWithRanges); - local testing on SPARC; - all hotspot 
tests with AllocaPrefetchStyle=2 on all platforms. Thank you! Best 
regards, Zoltan

>
> Thanks,
> Vladimir
>
>>
>> Testing:
>> - JPRT (incl. TestOptionsWithRanges)
>> - local testing on a SPARC machine.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
>>>> Hi,
>>>>
>>>>
>>>> please review the patch for 8153340.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8153340
>>>>
>>>>
>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and 
>>>> AllocatePrefetchDistance==0. The crash happens due to the way
>>>> the address for the first prefetch instruction is calculated [1]:
>>>>
>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= 
>>>> ~(AllocatePrefetchStepSize - 1) which can zero some of
>>>> the bits of cache_adr. That result in accesses *before* the newly 
>>>> allocated object.
>>>>
>>>>
>>>> Solution: Set lower limit of AllocatePrefetchDistance to 
>>>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3).
>>>> Unquarantine test.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>>>>
>>>> Testing:
>>>> - JPRT (incl. TestOptionsWithRanges.java)
>>>> - local testing on a SPARC machine.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>> [1] 
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941
>>


From zoltan.majo at oracle.com  Wed Apr 27 14:23:06 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 27 Apr 2016 16:23:06 +0200
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <5720CA98.10605@oracle.com>
References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com>
	<571F5756.7020007@oracle.com>
	<0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com>
	<5720CA98.10605@oracle.com>
Message-ID: <5720CB4A.2030904@oracle.com>


On 04/27/2016 04:20 PM, Zolt?n Maj? wrote:
> [...]
>
> OK. I think it is sufficient to do
>
> 52 if (!has_blk_init()) {
> 53 if (AllocatePrefetchInstr == 1) {
> 54 warning("BIS instructions required for AllocatePrefetchInstr 1 
> unavailable");
> 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0);
> 56 } I hope I'm not missing anything here. Here is the updated webrev: 
> http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT 
> (incl. TestOptionsWithRanges); - local testing on SPARC; - all hotspot 
> tests with AllocaPrefetchStyle=2 on all platforms. Thank you! Best 
> regards, Zoltan

Sorry for the missing newlines above -- my mail client is playing tricks 
sometimes.

>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Testing:
>>> - JPRT (incl. TestOptionsWithRanges)
>>> - local testing on a SPARC machine.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> please review the patch for 8153340.
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8153340
>>>>>
>>>>>
>>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and 
>>>>> AllocatePrefetchDistance==0. The crash happens due to the way
>>>>> the address for the first prefetch instruction is calculated [1]:
>>>>>
>>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= 
>>>>> ~(AllocatePrefetchStepSize - 1) which can zero some of
>>>>> the bits of cache_adr. That result in accesses *before* the newly 
>>>>> allocated object.
>>>>>
>>>>>
>>>>> Solution: Set lower limit of AllocatePrefetchDistance to 
>>>>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3).
>>>>> Unquarantine test.
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>>>>>
>>>>> Testing:
>>>>> - JPRT (incl. TestOptionsWithRanges.java)
>>>>> - local testing on a SPARC machine.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>> [1] 
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941
>>>
>


From vladimir.kozlov at oracle.com  Wed Apr 27 14:54:14 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 27 Apr 2016 07:54:14 -0700
Subject: RFR(S): 8154836: VM crash due to "Base pointers must match"
In-Reply-To: <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap>
References: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>
	<57201A20.7000504@oracle.com>
	<1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap>
Message-ID: <a32cc4b4-db9b-53fb-2efa-91b6442b1b46@oracle.com>

This looks good.

Thanks,
Vladimir

On 4/27/16 1:40 AM, Doerr, Martin wrote:
> Hi Vladimir and Zoltan,
>
> thanks for reviewing.
>
> I have made the requested changes:
> - Removed the code which skips the transformation on non-X86 platforms in heap-based compressed oops mode. I think we can discuss that independently.
> - Improved the assertion as suggested by Vladimir.
>
> New webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.01/
>
> Zoltan, you have already offered to sponsor this change. Thanks.
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Mittwoch, 27. April 2016 03:47
> To: Doerr, Martin <martin.doerr at sap.com>; Zolt?n Maj? <zoltan.majo at oracle.com>; hotspot-compiler-dev at openjdk.java.net compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match"
>
> Thank you, Martin
>
> On 4/25/16 4:04 AM, Doerr, Martin wrote:
>> Hi all,
>>
>> we have already seen such an assertion in final_graph_reshaping.
>>
>> We found 2 AddP nodes in a row. The ideal graph looked like this (simplified):
>> N0 ConP
>> N1 ConN
>> N2 AddP(Base = N0, Address = N0)
>> N3 AddP(Base = N0, Address = N2)
>>
>> Final graph reshaping visited N2 before N3 first and changed the graph:
>> N0 ConP
>> N1 ConN
>> N4 DecodeN
>> N2 AddP(Base = N4, Address = N4)
>> N3 AddP(Base = N0, Address = N2)
>>
>> Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected.
>>
>> I made a change to reconnect N3's Base input to N4, too.
>>
>> Webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/
>
> Fix looks good.
>
>>
>> In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row.
>> I wouldn't expect that to happen. Not sure if this assertion is desired.
>
> We should not have 3rd AddP but I agree with assert. You should add additional check to the assert:
>
> || out_j->in(AddPNode::Base) != addp
>
>>
>> I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive.
>> Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode.
>> (I believe X86 is the only platform which can match the decoding in the operand in this case.)
>
> It is not related to the fix and should be done separately. Or don't do at all.
> There is comment above the code which explains why it could beneficial on SPARC too:
>
> 2845       // On sparc loading 32-bits constant and decoding it have less
> 2846       // instructions (4) then load 64-bits constant (7).
>
> Thanks,
> Vladimir
>
>>
>> Please review. I will also need a sponsor, please.
>>
>> Best regards,
>> Martin
>>
>>

From vladimir.kozlov at oracle.com  Wed Apr 27 14:56:20 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 27 Apr 2016 07:56:20 -0700
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <57207C36.3030101@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com>
	<571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com>
	<DC5053CC-CE39-4FA1-BC7F-8C74EE40D761@oracle.com>
	<571F500E.5020904@oracle.com>
	<499d9f6b-378f-e350-eb72-c35930360825@oracle.com>
	<57207C36.3030101@oracle.com>
Message-ID: <18792245-5f8c-1f09-3359-4c89c33d17bd@oracle.com>

Thanks you for verifying positive condition.
Changes are good.

Thanks,
Vladimir

On 4/27/16 1:45 AM, Tobias Hartmann wrote:
> Hi Vladimir,
>
> On 27.04.2016 09:48, Vladimir Kozlov wrote:
>> I think next should be greaterEqual (branch from loop when limit+8 == 0) to avoid reading 8 bytes beyond array.
>>
>> +  // Bail out if we reached the end (but still do the comparison)
>> +  br(Assembler::positive, false, Assembler::pn, Lremaining);
>
> According to the SPARC manual [1], bpos branches if "not N", i.e. if limit is >= 0. I verified this with a small program (attached) using inline function templates [2]:
>
> $ cc -O main.c code.il
> $ ./a.out
>  1: 1
>  0: 1
> -1: 0
>
> That means the bpos branch is taken for 0 and 1 but not for -1. So the code should be correct.
>
>> An other trick you can is use xorcc instead of cmp.
>> Then you need only one srlx and compare it with zero (may be use cmp_zero_and_br()).
>
> Yes, John already suggested this but as I wrote in another email, it does not improve but slightly degrade performance.
>
> Thanks,
> Tobias
>
> [1] http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/140521-ua2011-d096-p-ext-2306580.pdf
> [2] https://docs.oracle.com/cd/E26502_01/html/E28387/gmabr.html
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/26/16 4:25 AM, Tobias Hartmann wrote:
>>> Thanks, Chris!
>>>
>>> On 25.04.2016 21:29, Christian Thalinger wrote:
>>>>
>>>>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>
>>>>> Hi Mikael,
>>>>>
>>>>> On 25.04.2016 09:59, Mikael Gerdin wrote:
>>>>>> Hi Tobias
>>>>>>
>>>>>> On 2016-04-25 08:38, Tobias Hartmann wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).
>>>>>>
>>>>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned.
>>>>>>
>>>>>> With compressed oops enabled the mark word + compressed class + alength
>>>>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned.
>>>>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case.
>>>>>
>>>>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned.
>>>>
>>>> See arrayOop.hpp:
>>>>
>>>>   // Returns the offset of the first element.
>>>>   static int base_offset_in_bytes(BasicType type) {
>>>>     return header_size(type) * HeapWordSize;
>>>>   }
>>>
>>> I verified that Mikael's claim is right:
>>> With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4).
>>> Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4).
>>>
>>> However, the header size is always aligned to HeapWordSize:
>>>
>>> static int header_size_in_bytes() {
>>>   size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize);
>>>
>>> and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned.
>>>
>>> Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance:
>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/
>>>
>>> Thanks,
>>> Tobias
>>>
>>>>>
>>>>> Best regards,
>>>>> Tobias
>>>>>
>>>>>>
>>>>>> /Mikael
>>>>>>
>>>>>>>
>>>>>>> If there are no objections, I would like to push the basic version:
>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>>
>>>>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tobias
>>>>>>>
>>>>>>> On 20.04.2016 15:31, Tobias Hartmann wrote:
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> On 20.04.2016 03:46, John Rose wrote:
>>>>>>>>> So I started looking at your code and my inner SPARC junkie took over.
>>>>>>>>>
>>>>>>>>> This is what happened:
>>>>>>>>>   http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
>>>>>>>>
>>>>>>>> Thanks a lot for having a look!
>>>>>>>>
>>>>>>>>> Perhaps there are some ideas that might be helpful:
>>>>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
>>>>>>>>
>>>>>>>> Right, this simplifies the code a bit:
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
>>>>>>>>
>>>>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
>>>>>>>>
>>>>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
>>>>>>>>
>>>>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
>>>>>>>>
>>>>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
>>>>>>>>
>>>>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
>>>>>>>>
>>>>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
>>>>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>>> [3] Runtime alignment checks:
>>>>>>>>
>>>>>>>> bind(Lunaligned);
>>>>>>>> Label next;
>>>>>>>> xor3(ary1, ary2, tmp);
>>>>>>>> and3(tmp, 7, tmp);
>>>>>>>> br_null_short(tmp, Assembler::pn, next);
>>>>>>>> STOP("One array is unaligned!");
>>>>>>>> should_not_reach_here();
>>>>>>>> bind(next);
>>>>>>>> STOP("Both arrays are unaligned!");
>>>>>>>>
>>>>>>>>> On the other hand, what you wrote is nice and simple.
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>> ? John
>>>>>>>>>
>>>>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>>>>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>>>>>>>>> But that's neither nice nor simple.
>>>>>>>>>
>>>>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks, Vladimir!
>>>>>>>>>>
>>>>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>>>>>>>>
>>>>>>>>>> Okay, I'll push the basic version.
>>>>>>>>>>
>>>>>>>>>> For reference, here are the results on a SPARC T4:
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>>>>>>>>
>>>>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>>>>>>>>> We do have arraycopy code for it but by default we don't use it:
>>>>>>>>>>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>>>>>>>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>>>>>>>>
>>>>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>>>>>>>>
>>>>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>>>>>>>>
>>>>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>>>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0
>>>>>>>>>> Error: Could not create the Java Virtual Machine.
>>>>>>>>>> Error: A fatal exception has occurred. Program will exit
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Tobias
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> please review the following enhancement:
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>>>>>>>>
>>>>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>>>>>>>>
>>>>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>>>>>>>>
>>>>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>>>>>>>>
>>>>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>>>>>>>>
>>>>>>>>>>>> I evaluated the following three versions of the patch.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Basic --
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>>>>
>>>>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>>>>>>>>
>>>>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>>>>> Version "small" tries to improve this.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Prefetching --
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>>>>
>>>>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Small --
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>>>>>>>>
>>>>>>>>>>>> The numbers can be found here:
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>>>>>>>>
>>>>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Tobias
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>>>>>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>>>

From vladimir.kozlov at oracle.com  Wed Apr 27 15:32:55 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 27 Apr 2016 08:32:55 -0700
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <57208F60.5000607@redhat.com>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
	<5720718B.6020102@redhat.com>
	<ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
	<57208F60.5000607@redhat.com>
Message-ID: <b13f2866-a8cd-b8c4-af99-7ad30a328ed5@oracle.com>

Hi Andrew,

Does size of immediate value have affect on latency on aarch64?
For ConP it is 64-bit constant and for ConN it is 32-bit.

Also, as Martin pointed, such constants are loaded from constant table now on SPARC and PPC. What about aarch64?

Thanks,
Vladimir

On 4/27/16 3:07 AM, Andrew Haley wrote:
> On 27/04/16 10:11, Doerr, Martin wrote:
>> Would this be good for aarch64 as well?
>
> On AArch64, LoadConP is
>
>     mov reg, #x
>     movk reg, #y shl #16
>     movk reg, #z shl 32
>
> (3 cycles latency)
>
> LoadConN + DecodeN heap-based
>
>    mov reg, #x
>    movk reg, #y shl #16
>    add reg, heapbase, reg shl #3
>
> (4 cycles latency)
>
> Andrew.
>

From aph at redhat.com  Wed Apr 27 15:37:44 2016
From: aph at redhat.com (Andrew Haley)
Date: Wed, 27 Apr 2016 16:37:44 +0100
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <b13f2866-a8cd-b8c4-af99-7ad30a328ed5@oracle.com>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
	<5720718B.6020102@redhat.com>
	<ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
	<57208F60.5000607@redhat.com>
	<b13f2866-a8cd-b8c4-af99-7ad30a328ed5@oracle.com>
Message-ID: <5720DCC8.7060009@redhat.com>

Hi,

On 04/27/2016 04:32 PM, Vladimir Kozlov wrote:

> Does size of immediate value have affect on latency on aarch64?
> For ConP it is 64-bit constant and for ConN it is 32-bit.

Immediate values are always 16 bits.  To load more than 16 bits you
have to use multiple instructions.  An AArch64 address is 48 bits
wide, so we need three 1-cycle instructions.

> Also, as Martin pointed, such constants are loaded from constant
> table now on SPARC and PPC. What about aarch64?

No, never.  It's too slow: 5 cycles latency, more if you miss L1
cache.

Andrew.

From rwestrel at redhat.com  Wed Apr 27 15:53:04 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 27 Apr 2016 17:53:04 +0200
Subject: SuperWord::unrolling_analysis() question
In-Reply-To: <C568518E7B433348B114B6A7122D474756E652F1@FMSMSX102.amr.corp.intel.com>
References: <dk67ffko335.fsf@rwestrel.remote.csb>
	<C568518E7B433348B114B6A7122D474756E652F1@FMSMSX102.amr.corp.intel.com>
Message-ID: <5720E060.2050308@redhat.com>


Hi Michael,

Thanks for the answer.

> The answer could be conditional if we had a machines with enough byte
> or short components to make vectors with, I chose INT as it is the
> current consistent minimum configuration for complete vector mapping.
> The best answer would be to create some code which mines the common
> type used in the current loops expressions, but I think we would be
> stuck with two passes over the code, the first to bind the common
> type, the second for finding the optimal sub vector mapping.  Or
> possibly moving the question to the machine layer as a query, where
> compiler writers choose the minimum consistent configuration based on
> current info on the machine we compile on.

Would two passes like sketched here:

http://cr.openjdk.java.net/~roland/vect-unroll-analysis/webrev/

would do the job?

Roland.

From rwestrel at redhat.com  Wed Apr 27 15:57:17 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 27 Apr 2016 17:57:17 +0200
Subject: RFR(XS): 8155015: Aarch64: bad assert in spill generation code
In-Reply-To: <571E1AE8.4020000@oracle.com>
References: <571E190E.5050600@redhat.com> <571E1AE8.4020000@oracle.com>
Message-ID: <5720E15D.3050006@redhat.com>

Thanks for the review, Tobias!

Roland.

On 04/25/2016 03:26 PM, Tobias Hartmann wrote:
> Hi Roland,
> 
> On 25.04.2016 15:18, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8155015/webrev.00/
>>
>> I hit that broken assert when doing some testing.
> 
> That looks good to me!
> 
> Best regards,
> Tobias
> 
>>
>> Roland.
>>

From tobias.hartmann at oracle.com  Wed Apr 27 16:08:35 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 27 Apr 2016 18:08:35 +0200
Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC
In-Reply-To: <18792245-5f8c-1f09-3359-4c89c33d17bd@oracle.com>
References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com>
	<57166721.5010208@oracle.com>
	<B6AD0FCA-31D0-4D19-BABC-CE3579E6B4BA@oracle.com>
	<571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com>
	<571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com>
	<DC5053CC-CE39-4FA1-BC7F-8C74EE40D761@oracle.com>
	<571F500E.5020904@oracle.com>
	<499d9f6b-378f-e350-eb72-c35930360825@oracle.com>
	<57207C36.3030101@oracle.com>
	<18792245-5f8c-1f09-3359-4c89c33d17bd@oracle.com>
Message-ID: <5720E403.6000400@oracle.com>

Thanks, Vladimir!

Best regards,
Tobias

On 27.04.2016 16:56, Vladimir Kozlov wrote:
> Thanks you for verifying positive condition.
> Changes are good.
> 
> Thanks,
> Vladimir
> 
> On 4/27/16 1:45 AM, Tobias Hartmann wrote:
>> Hi Vladimir,
>>
>> On 27.04.2016 09:48, Vladimir Kozlov wrote:
>>> I think next should be greaterEqual (branch from loop when limit+8 == 0) to avoid reading 8 bytes beyond array.
>>>
>>> +  // Bail out if we reached the end (but still do the comparison)
>>> +  br(Assembler::positive, false, Assembler::pn, Lremaining);
>>
>> According to the SPARC manual [1], bpos branches if "not N", i.e. if limit is >= 0. I verified this with a small program (attached) using inline function templates [2]:
>>
>> $ cc -O main.c code.il
>> $ ./a.out
>>  1: 1
>>  0: 1
>> -1: 0
>>
>> That means the bpos branch is taken for 0 and 1 but not for -1. So the code should be correct.
>>
>>> An other trick you can is use xorcc instead of cmp.
>>> Then you need only one srlx and compare it with zero (may be use cmp_zero_and_br()).
>>
>> Yes, John already suggested this but as I wrote in another email, it does not improve but slightly degrade performance.
>>
>> Thanks,
>> Tobias
>>
>> [1] http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/140521-ua2011-d096-p-ext-2306580.pdf
>> [2] https://docs.oracle.com/cd/E26502_01/html/E28387/gmabr.html
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/26/16 4:25 AM, Tobias Hartmann wrote:
>>>> Thanks, Chris!
>>>>
>>>> On 25.04.2016 21:29, Christian Thalinger wrote:
>>>>>
>>>>>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>>
>>>>>> Hi Mikael,
>>>>>>
>>>>>> On 25.04.2016 09:59, Mikael Gerdin wrote:
>>>>>>> Hi Tobias
>>>>>>>
>>>>>>> On 2016-04-25 08:38, Tobias Hartmann wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug).
>>>>>>>
>>>>>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned.
>>>>>>>
>>>>>>> With compressed oops enabled the mark word + compressed class + alength
>>>>>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned.
>>>>>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case.
>>>>>>
>>>>>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned.
>>>>>
>>>>> See arrayOop.hpp:
>>>>>
>>>>>   // Returns the offset of the first element.
>>>>>   static int base_offset_in_bytes(BasicType type) {
>>>>>     return header_size(type) * HeapWordSize;
>>>>>   }
>>>>
>>>> I verified that Mikael's claim is right:
>>>> With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4).
>>>> Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4).
>>>>
>>>> However, the header size is always aligned to HeapWordSize:
>>>>
>>>> static int header_size_in_bytes() {
>>>>   size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize);
>>>>
>>>> and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned.
>>>>
>>>> Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance:
>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Tobias
>>>>>>
>>>>>>>
>>>>>>> /Mikael
>>>>>>>
>>>>>>>>
>>>>>>>> If there are no objections, I would like to push the basic version:
>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>>>
>>>>>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>> On 20.04.2016 15:31, Tobias Hartmann wrote:
>>>>>>>>> Hi John,
>>>>>>>>>
>>>>>>>>> On 20.04.2016 03:46, John Rose wrote:
>>>>>>>>>> So I started looking at your code and my inner SPARC junkie took over.
>>>>>>>>>>
>>>>>>>>>> This is what happened:
>>>>>>>>>>   http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/
>>>>>>>>>
>>>>>>>>> Thanks a lot for having a look!
>>>>>>>>>
>>>>>>>>>> Perhaps there are some ideas that might be helpful:
>>>>>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr.
>>>>>>>>>
>>>>>>>>> Right, this simplifies the code a bit:
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/
>>>>>>>>>
>>>>>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is?
>>>>>>>>>
>>>>>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less?
>>>>>>>>>
>>>>>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed:
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01
>>>>>>>>>
>>>>>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1].
>>>>>>>>>
>>>>>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd).
>>>>>>>>>
>>>>>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Tobias
>>>>>>>>>
>>>>>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx
>>>>>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>>>> [3] Runtime alignment checks:
>>>>>>>>>
>>>>>>>>> bind(Lunaligned);
>>>>>>>>> Label next;
>>>>>>>>> xor3(ary1, ary2, tmp);
>>>>>>>>> and3(tmp, 7, tmp);
>>>>>>>>> br_null_short(tmp, Assembler::pn, next);
>>>>>>>>> STOP("One array is unaligned!");
>>>>>>>>> should_not_reach_here();
>>>>>>>>> bind(next);
>>>>>>>>> STOP("Both arrays are unaligned!");
>>>>>>>>>
>>>>>>>>>> On the other hand, what you wrote is nice and simple.
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>> ? John
>>>>>>>>>>
>>>>>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more
>>>>>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs.
>>>>>>>>>> But that's neither nice nor simple.
>>>>>>>>>>
>>>>>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks, Vladimir!
>>>>>>>>>>>
>>>>>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote:
>>>>>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed.
>>>>>>>>>>>
>>>>>>>>>>> Okay, I'll push the basic version.
>>>>>>>>>>>
>>>>>>>>>>> For reference, here are the results on a SPARC T4:
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png
>>>>>>>>>>>
>>>>>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC."
>>>>>>>>>>>> We do have arraycopy code for it but by default we don't use it:
>>>>>>>>>>>>  product(uintx,  ArraycopySrcPrefetchDistance, 0,
>>>>>>>>>>>>  product(uintx,  ArraycopyDstPrefetchDistance, 0,
>>>>>>>>>>>>
>>>>>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code.
>>>>>>>>>>>
>>>>>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0:
>>>>>>>>>>>
>>>>>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version
>>>>>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0
>>>>>>>>>>> Error: Could not create the Java Virtual Machine.
>>>>>>>>>>> Error: A fatal exception has occurred. Program will exit
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Tobias
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> please review the following enhancement:
>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938
>>>>>>>>>>>>>
>>>>>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals().
>>>>>>>>>>>>>
>>>>>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value().
>>>>>>>>>>>>>
>>>>>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I evaluated the following three versions of the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Basic --
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/
>>>>>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>>>>>
>>>>>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>>>>>> Version "small" tries to improve this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Prefetching --
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/
>>>>>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Small --
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/
>>>>>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays").
>>>>>>>>>>>>>
>>>>>>>>>>>>> The numbers can be found here:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Tobias
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java
>>>>>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
>>>>>>>>>>>>> [3] Microbenchmark results for the "basic" implementation
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png
>>>>>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png
>>>>>

From nassim.halli at aselta.com  Wed Apr 27 17:17:41 2016
From: nassim.halli at aselta.com (Nassim HALLI)
Date: Wed, 27 Apr 2016 19:17:41 +0200
Subject: Are Java native methods inlined ?
In-Reply-To: <5720956F.1060808@redhat.com>
References: <CAOS5t8KwKCVA3e=7X1n9_pu1qjdrUrwXcmtAYS8cNCphZvAckw@mail.gmail.com>
	<5720956F.1060808@redhat.com>
Message-ID: <5720F435.2080908@aselta.com>

Hi Andrew,

Thanks for the clarification.

Nassim.

Le 27/04/2016 12:33, Andrew Haley a ?crit :
> On 27/04/16 10:12, Nassim Halli wrote:
>
>> Just a simple question : are Java native methods inlined ?
> Not exactly.
>
>> By "native methods" I mean the interface between the Java caller and
>> the targeted native JNI function in the dl (for which the
>> JIT-compiled code can be observed using *+PrintNativeNMethods)*.
>>
>> I ask this question since I don't observe inlining with Java 8 and I
>> heard it was. Although I understand that it could not necessarily
>> lead to great improvements it's just for a clarification.
> The glue code between Java and native code is not inlined into its
> caller but it is generated by the JIT compiler.  Like every
> optimization in HotSpot, this is only done for hot code.  If you run
> something like Netbeans you'll see a lot of native nmethods generated.
>
> Andrew.


From christian.thalinger at oracle.com  Wed Apr 27 19:01:02 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 27 Apr 2016 09:01:02 -1000
Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable
	array fields
In-Reply-To: <5720BF26.2050400@oracle.com>
References: <571A36D7.2030701@oracle.com> <5720BF26.2050400@oracle.com>
Message-ID: <9869ACCB-BB3C-40E8-A5A0-585EFB347DAE@oracle.com>

Ok.

> On Apr 27, 2016, at 3:31 AM, Roland Schatz <roland.schatz at oracle.com> wrote:
> 
> Please don't integrate this yet. We're looking for a more general solution, it's possible this won't be needed in the final version.
> 
> Thanks,
> Roland
> 
> On 04/22/2016 04:36 PM, Andreas Woess wrote:
>> Please review:
>> http://cr.openjdk.java.net/~aw/8154765/webrev/ <http://cr.openjdk.java.net/%7Eaw/8154765/webrev/>
>> https://bugs.openjdk.java.net/browse/JDK-8154765 <https://bugs.openjdk.java.net/browse/JDK-8154765>
>> 
>> This change adds an optional stableDimensions parameter to ConstantReflectionProvider.readStableFieldValue that allows the number of stable array dimensions to be specified more fine-granular than inferring it from the type of the field.
>> 
>> Thanks,
>> Andreas
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/415e7a53/attachment.html>

From dean.long at oracle.com  Wed Apr 27 19:17:42 2016
From: dean.long at oracle.com (Dean Long)
Date: Wed, 27 Apr 2016 12:17:42 -0700
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
	<5720718B.6020102@redhat.com>
	<ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
Message-ID: <6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com>

On 4/27/2016 2:11 AM, Doerr, Martin wrote:
> Hi Roland,
>
> I have removed the piece of my change which would interfere with your change.
>
> But I'd like to explain the intention of skipping the transformation on non-x86 platforms in heap-based compressed oops mode.
> (In simpler compressed oops modes decoding is very cheap so the transformation is probably good.)
> Without transformation we have: LoadConP + Storage access
> With transformation we have: LoadConN + DecodeN heap-based + Storage access
>
> I believe X86 is the only platform which can match the DecodeN heap-based into the Storage access.
>
> I guess other platforms should prefer the untransformed version:
> PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded.
>
> I didn't take a closer look at SPARC, but I thought it would use the constant pool as well. Not sure if the following comment is still correct:
> // On sparc loading 32-bits constant and decoding it have less
> // instructions (4) then load 64-bits constant (7).
>
>
> Therefore, I had proposed the following code to skip:
> +         // Matching decode heap based into an operand only works on X86.
> + #if !defined(X86)
> +         if ((op == Op_ConN      && Universe::narrow_oop_base()   != NULL) ||
> +             (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) {
> +           break;
> +         }
> + #endif
> +
>
> Would this be good for aarch64 as well?
> Would you like to include code which skips the transformation in your change or should this better be discussed independently?

Martin, wouldn't your #ifdef X86 code be better as a Matcher function, 
similar to narrow_oop_use_complex_address()?

dl

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Roland Westrelin [mailto:rwestrel at redhat.com]
> Sent: Mittwoch, 27. April 2016 10:00
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode
>
> Hi Martin,
>
>> seems like this issue is related to what I have sent out today:
>> RFR(S): 8154836: VM crash due to "Base pointers must match"
>> I also had to change the AddP case of final graph reshaping.
>>
>> In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode.
>>
>> Maybe I have to remove that part of my change, or at least adapt it.
>> We should make sure that the changes don't get pushed on the same day.
> Thanks for the heads up. It looks like your change will get in before
> mine. I'll send an updated webrev once it's integrated.
>
> Roland.


From john.r.rose at oracle.com  Wed Apr 27 21:13:15 2016
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 27 Apr 2016 14:13:15 -0700
Subject: Are Java native methods inlined ?
In-Reply-To: <5720956F.1060808@redhat.com>
References: <CAOS5t8KwKCVA3e=7X1n9_pu1qjdrUrwXcmtAYS8cNCphZvAckw@mail.gmail.com>
	<5720956F.1060808@redhat.com>
Message-ID: <4E516677-5A69-4133-9962-AF64270B0A2C@oracle.com>

On Apr 27, 2016, at 3:33 AM, Andrew Haley <aph at redhat.com> wrote:
> 
> The glue code between Java and native code is not inlined into its
> caller but it is generated by the JIT compiler.  Like every
> optimization in HotSpot, this is only done for hot code.  If you run
> something like Netbeans you'll see a lot of native nmethods generated.

In Project Panama we are experimenting with inlining wrapper logic
for native calls.  But the 20-year practice in HotSpot is (as you say)
to roll a separate, non-inlined wrapper for each JNI method.  The JIT
per se does not create this wrapper, but rather this logic:

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/61a214186dae/src/cpu/x86/vm/sharedRuntime_x86_64.cpp#l1825

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/e80a11b9/attachment.html>

From john.r.rose at oracle.com  Wed Apr 27 21:55:09 2016
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 27 Apr 2016 14:55:09 -0700
Subject: SuperWord::unrolling_analysis() question
In-Reply-To: <5720E060.2050308@redhat.com>
References: <dk67ffko335.fsf@rwestrel.remote.csb>
	<C568518E7B433348B114B6A7122D474756E652F1@FMSMSX102.amr.corp.intel.com>
	<5720E060.2050308@redhat.com>
Message-ID: <47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com>

It is reasonable to look ahead into the loop to find the largest applicable vector size, before choosing an unroll factor.
A loop which works on bytes and doubles at the same time will want to unroll only up to vector-of-double.
But a loop which works only on bytes will want to unroll more.
Is that what we are talking about here?
? John

> On Apr 27, 2016, at 8:53 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> 
> Hi Michael,
> 
> Thanks for the answer.
> 
>> The answer could be conditional if we had a machines with enough byte
>> or short components to make vectors with, I chose INT as it is the
>> current consistent minimum configuration for complete vector mapping.
>> The best answer would be to create some code which mines the common
>> type used in the current loops expressions, but I think we would be
>> stuck with two passes over the code, the first to bind the common
>> type, the second for finding the optimal sub vector mapping.  Or
>> possibly moving the question to the machine layer as a query, where
>> compiler writers choose the minimum consistent configuration based on
>> current info on the machine we compile on.
> 
> Would two passes like sketched here:
> 
> http://cr.openjdk.java.net/~roland/vect-unroll-analysis/webrev/
> 
> would do the job?
> 
> Roland.


From vladimir.kozlov at oracle.com  Wed Apr 27 21:55:16 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 27 Apr 2016 14:55:16 -0700
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <5720CA98.10605@oracle.com>
References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com>
	<571F5756.7020007@oracle.com>
	<0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com>
	<5720CA98.10605@oracle.com>
Message-ID: <8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com>

Hi Zoltan

On 4/27/16 7:20 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> thank you for the feedback!
>
> On 04/27/2016 12:39 AM, Vladimir Kozlov wrote:
>> On 4/26/16 4:56 AM, Zolt?n Maj? wrote:
>>> Hi Vladimir,
>>>
>>>
>>> thank you for the feedback! Please see comments below.
>>>
>>> On 04/22/2016 12:37 AM, Vladimir Kozlov wrote:
>>>> Hi, Zoltan
>>>>
>>>> I think we should change code in prefetch_allocation() instead:
>>>>
>>>>       Node *cache_adr = new AddPNode(old_eden_top, old_eden_top,
>>>> _igvn.MakeConX(step_size + distance));
>>>
>>> The problem is that AllocatePrefetchStyle must align the first prefetched address to 8 bytes, otherwise the emitted stxa
>>> instruction could cause a SIGBUS. But the alignment does not have to be at step_size boundary, 8-byte alignment is
>>> sufficient.
>>
>> Actually it has to be step_size (cache line size) aligned - BIS instruction is triggered when the address of stxa is
>> the beginning of cache line for AllocatePrefetchStyle == 3. If it is just 8 bytes aligned it will be simple store.
>
> Thank you for clarifying.
>
>> We should require AllocatePrefetchStepSize to be 8 bytes aligned in vm_version_sparc.cpp instead of:
>> +       // BIS instructions require 8-byte aligned addresses
>> +       Node* mask = _igvn.MakeConX(~(intptr_t)(wordSize - 1));
>
> I agree. But I'd prefer we do that in the AllocatePrefetchStepSizeConstraintFunc() constraint function in
> commandLineFlagConstraintsCompiler.cpp). The reason is that since JEP-245 the preferred place to validate of
> command-line arguments is in constraint functions. Most compiler-related checks were moved there with JDK-8078554 and
> JDK-8146478.

Okay. Usually we do platform specific flag's setting in vm_version_<arch>.cpp files but it looks like it start changing.
I am not supporter of having multiply #ifdef in shared code (in commandLineFlagConstraintsCompiler).

>
> I'd also like to set the minimum value for AllocatePrefetchDistance to AllocatePrefetchStepSize, otherwise we can access
> the heap before newly allocated objects/arrays.

Only for style 3. And I said it may be better to change code.

>>>
>>>>
>>>> These way we allow AllocatePrefetchDistance == 0 in all AllocatePrefetchStyle cases - it is consistent.
>>>
>>> Unfortunately, it is not easy to have the same limit for AllocatePrefetchDistance on all platforms. Due to the 8-byte
>>> alignment performed by compiled code, the lower limit of 8 for AllocatePrefetchDistance is needed on SPARC; the lower
>>> limit of 0 works fine on all other platforms.
>>>
>>> I've started looking at the consistency of flags controlling allocation prefetch in general, as we have other issues
>>> open related to them (e.g., JDK-8151622). We're touching related code now, so I thought, maybe it makes sense to fix all
>>> remaining issues at once.
>>
>> Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - it should be used only if UseTLAB is true. Please,
>> look.
>
> It seems AllocatePrefetchStyle = 2 can be used only if UseTLAB is true.
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/6a17c49de974/src/share/vm/opto/macro.cpp#l1844

I got report from it crashing on T7 (attached). May be it is because by default it used BIS with it.

Thanks,
Vladimir

>
> Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, I've started an RBT run with all hotspot on all
> platforms using AllocatePrefetchStyle=2. So far no problems have shown up.
>
>> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch should be set for all styles on all platforms to
>> avoid accessing beyond heap. Prefetch instructions doc say that they does not trap but we should be careful.
>
> I agree.
>
> That means we initialize _reserve_for_allocation_prefetch in a platform-independent way. So I think it would make sense
> to move that field to ThreadLocalAllocBuffer, as TLAB is the only user of that field and we don't support
> platform-independent initialization of Abstract_VM_Version. I did that in the updated webrev.
>
>>>
>>> The updated webrev does the following (in addition to fixing the original problem with AllocatePrefetchDistance):
>>>
>>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 (i.e., BIS instructions are used for prefetching). As
>>> far as I understand, AllocatePrefetchStyle = 3 was added to support prefetching with BIS, so if BIS is enabled, we
>>> should use AllocatePrefetchStyle = 3.
>>
>> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should select AllocatePrefetchStyle = 3.
>
> OK.
>
>> But we should allow AllocatePrefetchStyle = 3  if normal prefetch instructions (or other platforms) are used.
>
> OK, I've removed that restriction.
>
>> I think we should update comment in globals.hpp to say "generate one prefetch per cache line" without saying BIS.
>
> OK.
>
>>
>> But I agree if BIS is not available we should not use BIS AllocatePrefetchInstr = 1.
>
> OK.
>
>>
>>>
>>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on non-SPARC platforms.
>>
>> It could be useful on other platforms since it does one access per cache line.
>
> OK, let's keep it available then.
>
>>
>>
>>>
>>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if AllocatePrefetchStyle is 3 (due to alignment requirements).
>>
>> That is correct since stxa requires at least 8 bytes alignment (as stx).
>
> OK.
>
>>
>>>
>>> 3. Determine the number of lines to prefetch in the same way for all prefetch styles:
>>> lines = (prefecth instance allocation) ? AllocateInstancePrefetchLines : AllocatePrefetchLines
>>
>> Agree.
>
> OK.
>
>>
>>>
>>> Here is the updated webrev:
>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/
>>
>> vm_version_sparc.cpp
>> AllocatePrefetchInstr = 0 should be set for all styles (not only 1) when BIS is not available.
>
> OK. I think it is sufficient to do
>
> 52 if (!has_blk_init()) {
> 53 if (AllocatePrefetchInstr == 1) {
> 54 warning("BIS instructions required for AllocatePrefetchInstr 1 unavailable");
> 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0);
> 56 } I hope I'm not missing anything here. Here is the updated webrev:
> http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT (incl. TestOptionsWithRanges); - local testing on
> SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all platforms. Thank you! Best regards, Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Testing:
>>> - JPRT (incl. TestOptionsWithRanges)
>>> - local testing on a SPARC machine.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> please review the patch for 8153340.
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8153340
>>>>>
>>>>>
>>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way
>>>>> the address for the first prefetch instruction is calculated [1]:
>>>>>
>>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of
>>>>> the bits of cache_adr. That result in accesses *before* the newly allocated object.
>>>>>
>>>>>
>>>>> Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3).
>>>>> Unquarantine test.
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>>>>>
>>>>> Testing:
>>>>> - JPRT (incl. TestOptionsWithRanges.java)
>>>>> - local testing on a SPARC machine.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941
>>>
>
-------------- next part --------------
An embedded message was scrubbed...
From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Subject: Re: Java's use of BIS and RAW penalty
Date: Tue, 15 Mar 2016 11:51:01 -0700
Size: 7622
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/ad0d0186/AttachedMessage-0001.mht>

From michael.c.berg at intel.com  Wed Apr 27 21:58:31 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Wed, 27 Apr 2016 21:58:31 +0000
Subject: SuperWord::unrolling_analysis() question
In-Reply-To: <47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com>
References: <dk67ffko335.fsf@rwestrel.remote.csb>
	<C568518E7B433348B114B6A7122D474756E652F1@FMSMSX102.amr.corp.intel.com>
	<5720E060.2050308@redhat.com>
	<47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E658E3@FMSMSX102.amr.corp.intel.com>

John, it is pretty much that issue(unrolling for the available supported vector), I am testing some changes now for Roland.

-----Original Message-----
From: John Rose [mailto:john.r.rose at oracle.com] 
Sent: Wednesday, April 27, 2016 2:55 PM
To: rwestrel at redhat.com
Cc: Berg, Michael C <michael.c.berg at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: SuperWord::unrolling_analysis() question

It is reasonable to look ahead into the loop to find the largest applicable vector size, before choosing an unroll factor.
A loop which works on bytes and doubles at the same time will want to unroll only up to vector-of-double.
But a loop which works only on bytes will want to unroll more.
Is that what we are talking about here?
- John

> On Apr 27, 2016, at 8:53 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> 
> Hi Michael,
> 
> Thanks for the answer.
> 
>> The answer could be conditional if we had a machines with enough byte 
>> or short components to make vectors with, I chose INT as it is the 
>> current consistent minimum configuration for complete vector mapping.
>> The best answer would be to create some code which mines the common 
>> type used in the current loops expressions, but I think we would be 
>> stuck with two passes over the code, the first to bind the common 
>> type, the second for finding the optimal sub vector mapping.  Or 
>> possibly moving the question to the machine layer as a query, where 
>> compiler writers choose the minimum consistent configuration based on 
>> current info on the machine we compile on.
> 
> Would two passes like sketched here:
> 
> http://cr.openjdk.java.net/~roland/vect-unroll-analysis/webrev/
> 
> would do the job?
> 
> Roland.


From tom.rodriguez at oracle.com  Wed Apr 27 23:51:23 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 27 Apr 2016 16:51:23 -0700
Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly
	handle private methods in interfaces
In-Reply-To: <80797340-B145-42F2-ABED-902197C9AC17@oracle.com>
References: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com>
	<80797340-B145-42F2-ABED-902197C9AC17@oracle.com>
Message-ID: <12E0E6EC-C97F-4D7D-96BB-A7A4A6F8A7BD@oracle.com>

One of the jtreg tests had to be fixed after these changes, so I?ve updated the webrev.  The only changed file is http://cr.openjdk.java.net/~never/8152903/webrev/test/compiler/jvmci/compilerToVM/ResolveMethodTest.java.udiff.html <http://cr.openjdk.java.net/~never/8152903/webrev/test/compiler/jvmci/compilerToVM/ResolveMethodTest.java.udiff.html>

The test case was actually resolving the method against the holder type instead of the receiver but this only mattered for interface types.  We also require the type to be linked before answering the question so we have to force initialization.  

tom

> On Apr 21, 2016, at 8:23 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> Looks good!
> 
> igor
> 
>> On Apr 21, 2016, at 6:22 PM, Tom Rodriguez <tom.rodriguez at oracle.com <mailto:tom.rodriguez at oracle.com>> wrote:
>> 
>> http://cr.openjdk.java.net/~never/8152903/webrev <http://cr.openjdk.java.net/~never/8152903/webrev> 
>> 
>> JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. 
>> 
>> Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes.  A modified version of the test which found the issue also passes now.  I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes.  https://bugs.openjdk.java.net/browse/JDK-8154904 <https://bugs.openjdk.java.net/browse/JDK-8154904>
>> 
>> tom
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/f9d4185a/attachment.html>

From michael.c.berg at intel.com  Thu Apr 28 02:39:41 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 28 Apr 2016 02:39:41 +0000
Subject: SuperWord::unrolling_analysis() question
In-Reply-To: <C568518E7B433348B114B6A7122D474756E658E3@FMSMSX102.amr.corp.intel.com>
References: <dk67ffko335.fsf@rwestrel.remote.csb>
	<C568518E7B433348B114B6A7122D474756E652F1@FMSMSX102.amr.corp.intel.com>
	<5720E060.2050308@redhat.com>
	<47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com>
	<C568518E7B433348B114B6A7122D474756E658E3@FMSMSX102.amr.corp.intel.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E659CC@FMSMSX102.amr.corp.intel.com>

Roland, for superword.cpp you only need this one line as change, which I have tested and for which has no negative side effects on x86.  It will address the issue(oldly enough, its where we started):

Line 201   int max_vector = Matcher::max_vector_size(T_BYTE);

-Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
Sent: Wednesday, April 27, 2016 2:59 PM
To: John Rose <john.r.rose at oracle.com>; rwestrel at redhat.com
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: RE: SuperWord::unrolling_analysis() question

John, it is pretty much that issue(unrolling for the available supported vector), I am testing some changes now for Roland.

-----Original Message-----
From: John Rose [mailto:john.r.rose at oracle.com]
Sent: Wednesday, April 27, 2016 2:55 PM
To: rwestrel at redhat.com
Cc: Berg, Michael C <michael.c.berg at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: SuperWord::unrolling_analysis() question

It is reasonable to look ahead into the loop to find the largest applicable vector size, before choosing an unroll factor.
A loop which works on bytes and doubles at the same time will want to unroll only up to vector-of-double.
But a loop which works only on bytes will want to unroll more.
Is that what we are talking about here?
- John

> On Apr 27, 2016, at 8:53 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> 
> Hi Michael,
> 
> Thanks for the answer.
> 
>> The answer could be conditional if we had a machines with enough byte 
>> or short components to make vectors with, I chose INT as it is the 
>> current consistent minimum configuration for complete vector mapping.
>> The best answer would be to create some code which mines the common 
>> type used in the current loops expressions, but I think we would be 
>> stuck with two passes over the code, the first to bind the common 
>> type, the second for finding the optimal sub vector mapping.  Or 
>> possibly moving the question to the machine layer as a query, where 
>> compiler writers choose the minimum consistent configuration based on 
>> current info on the machine we compile on.
> 
> Would two passes like sketched here:
> 
> http://cr.openjdk.java.net/~roland/vect-unroll-analysis/webrev/
> 
> would do the job?
> 
> Roland.


From igor.veresov at oracle.com  Thu Apr 28 02:43:30 2016
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 27 Apr 2016 19:43:30 -0700
Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly
	handle private methods in interfaces
In-Reply-To: <12E0E6EC-C97F-4D7D-96BB-A7A4A6F8A7BD@oracle.com>
References: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com>
	<80797340-B145-42F2-ABED-902197C9AC17@oracle.com>
	<12E0E6EC-C97F-4D7D-96BB-A7A4A6F8A7BD@oracle.com>
Message-ID: <168363BA-C08D-4FB9-8116-BB4ED6DFE366@oracle.com>

Good.

igor

> On Apr 27, 2016, at 4:51 PM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> One of the jtreg tests had to be fixed after these changes, so I?ve updated the webrev.  The only changed file is http://cr.openjdk.java.net/~never/8152903/webrev/test/compiler/jvmci/compilerToVM/ResolveMethodTest.java.udiff.html <http://cr.openjdk.java.net/~never/8152903/webrev/test/compiler/jvmci/compilerToVM/ResolveMethodTest.java.udiff.html>
> 
> The test case was actually resolving the method against the holder type instead of the receiver but this only mattered for interface types.  We also require the type to be linked before answering the question so we have to force initialization.  
> 
> tom
> 
>> On Apr 21, 2016, at 8:23 PM, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>> 
>> Looks good!
>> 
>> igor
>> 
>>> On Apr 21, 2016, at 6:22 PM, Tom Rodriguez <tom.rodriguez at oracle.com <mailto:tom.rodriguez at oracle.com>> wrote:
>>> 
>>> http://cr.openjdk.java.net/~never/8152903/webrev <http://cr.openjdk.java.net/~never/8152903/webrev> 
>>> 
>>> JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. 
>>> 
>>> Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes.  A modified version of the test which found the issue also passes now.  I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes.  https://bugs.openjdk.java.net/browse/JDK-8154904 <https://bugs.openjdk.java.net/browse/JDK-8154904>
>>> 
>>> tom
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160427/c3966b57/attachment.html>

From robbin.ehn at oracle.com  Thu Apr 28 05:31:23 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 28 Apr 2016 07:31:23 +0200
Subject: RFR: 8155206: Internal VM test DirectiveParser_test is too verbose
In-Reply-To: <5720761A.4010004@oracle.com>
References: <5720761A.4010004@oracle.com>
Message-ID: <9da44f04-6a24-a96a-67c5-c443a74c8df1@oracle.com>

Hi Stefan, thank you for taking care of this!

This looks good to me!

/Robbin

On 04/27/2016 10:19 AM, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to silence the DirectiveParser_test internal VM
> test.
>
> http://cr.openjdk.java.net/~stefank/8155206/webrev.01
> https://bugs.openjdk.java.net/browse/JDK-8155206
>
> Before the patch, we got the following output when running with
> -XX:+ExecuteInternalVMTests:
>
> ...
> Running test: Test_log_file_startup_truncation
> Running test: Test_invalid_log_file
> Running test: DirectivesParser_test
> Internal error on line 1 byte 2: Directive missing required match.
>   Got EOS.
>   At '}'.
> {}
> Internal error on line 1 byte 3: Directive missing required match.
>   At '}]'.
> [{}]
> Internal error on line 1 byte 3: Directive missing required match.
>   At '},{}]'.
> [{},{}]
> Internal error on line 1 byte 2: Directive missing required match.
>   At '},{}'.
> {},{}
> Syntax error on line 2 byte 3: DirectivesParser can only start with an
> array containing directive objects, or one single directive.
>   At '['.
>   [
>     {
>       match: "foo/bar.*",
>       inline : "+java/util.*",
>       PrintAssembly: true,
>       BreakAtExecute: true,
>     }
>   ]
> ]
>
> Value error on line 4 byte 20: The key 'PrintInlining' does not allow an
> array of values.
>   At '['.
>     PrintInlining: [
>       true,
>       false
>     ],
>   }
> ]
>
> Warning: +LogCompilation must be set to enable compilation logging from
> directives
> Warning: +LogCompilation must be set to enable compilation logging from
> directives
> Value error on line 7 byte 9: Method pattern error: Missing leading
> inline type (+/-)
>   At '"foo",'.
>         "foo",
>         "bar",
>       ]
>     }
>   }
> ]
>
> Value error on line 8 byte 9: Method pattern error: Missing leading
> inline type (+/-)
>   At '"bar",'.
>         "bar",
>       ]
>     }
>   }
> ]
>
> Key error on line 1 byte 7: Key 'c1' not allowed after 'c1' key.
>   At 'c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]'.
> [{c1:{c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]
> Value error on line 5 byte 12: Key of type match needs a value of type
> string
>   At 'true,'.
>     match: true,
>     inline: true,
>     enable: true,
>     c1: {
>       preset: true,
>     }
>   }
> ]
>
> Running test: Test_TempNewSymbol
> Running test: VMStructs_test
> ...
>
> With the patch the output is much less noisy:
> ...
> Running test: Test_log_file_startup_truncation
> Running test: Test_invalid_log_file
> Running test: DirectivesParser_test
> Warning:  +LogCompilation must be set to enable compilation logging from
> directives
> Warning:  +LogCompilation must be set to enable compilation logging from
> directives
> Running test: Test_TempNewSymbol
> Running test: VMStructs_test
> ...
>
> We might want to get rid of the Warning messages, but I think the
> proposed patch is a good first step.
>
> You can turn on the old output with -XX:+VerboseInternalVMTests.
>
> Tested with -XX:+ExecuteInternalVMTests :)
>
> Thanks,
> StefanK

From stefan.karlsson at oracle.com  Thu Apr 28 05:59:04 2016
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 28 Apr 2016 07:59:04 +0200
Subject: RFR: 8155206: Internal VM test DirectiveParser_test is too verbose
In-Reply-To: <9da44f04-6a24-a96a-67c5-c443a74c8df1@oracle.com>
References: <5720761A.4010004@oracle.com>
	<9da44f04-6a24-a96a-67c5-c443a74c8df1@oracle.com>
Message-ID: <5721A6A8.5070408@oracle.com>

Thanks, Robbin.

StefanK

On 2016-04-28 07:31, Robbin Ehn wrote:
> Hi Stefan, thank you for taking care of this!
>
> This looks good to me!
>
> /Robbin
>
> On 04/27/2016 10:19 AM, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to silence the DirectiveParser_test internal VM
>> test.
>>
>> http://cr.openjdk.java.net/~stefank/8155206/webrev.01
>> https://bugs.openjdk.java.net/browse/JDK-8155206
>>
>> Before the patch, we got the following output when running with
>> -XX:+ExecuteInternalVMTests:
>>
>> ...
>> Running test: Test_log_file_startup_truncation
>> Running test: Test_invalid_log_file
>> Running test: DirectivesParser_test
>> Internal error on line 1 byte 2: Directive missing required match.
>>   Got EOS.
>>   At '}'.
>> {}
>> Internal error on line 1 byte 3: Directive missing required match.
>>   At '}]'.
>> [{}]
>> Internal error on line 1 byte 3: Directive missing required match.
>>   At '},{}]'.
>> [{},{}]
>> Internal error on line 1 byte 2: Directive missing required match.
>>   At '},{}'.
>> {},{}
>> Syntax error on line 2 byte 3: DirectivesParser can only start with an
>> array containing directive objects, or one single directive.
>>   At '['.
>>   [
>>     {
>>       match: "foo/bar.*",
>>       inline : "+java/util.*",
>>       PrintAssembly: true,
>>       BreakAtExecute: true,
>>     }
>>   ]
>> ]
>>
>> Value error on line 4 byte 20: The key 'PrintInlining' does not allow an
>> array of values.
>>   At '['.
>>     PrintInlining: [
>>       true,
>>       false
>>     ],
>>   }
>> ]
>>
>> Warning: +LogCompilation must be set to enable compilation logging from
>> directives
>> Warning: +LogCompilation must be set to enable compilation logging from
>> directives
>> Value error on line 7 byte 9: Method pattern error: Missing leading
>> inline type (+/-)
>>   At '"foo",'.
>>         "foo",
>>         "bar",
>>       ]
>>     }
>>   }
>> ]
>>
>> Value error on line 8 byte 9: Method pattern error: Missing leading
>> inline type (+/-)
>>   At '"bar",'.
>>         "bar",
>>       ]
>>     }
>>   }
>> ]
>>
>> Key error on line 1 byte 7: Key 'c1' not allowed after 'c1' key.
>>   At 'c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]'.
>> [{c1:{c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]
>> Value error on line 5 byte 12: Key of type match needs a value of type
>> string
>>   At 'true,'.
>>     match: true,
>>     inline: true,
>>     enable: true,
>>     c1: {
>>       preset: true,
>>     }
>>   }
>> ]
>>
>> Running test: Test_TempNewSymbol
>> Running test: VMStructs_test
>> ...
>>
>> With the patch the output is much less noisy:
>> ...
>> Running test: Test_log_file_startup_truncation
>> Running test: Test_invalid_log_file
>> Running test: DirectivesParser_test
>> Warning:  +LogCompilation must be set to enable compilation logging from
>> directives
>> Warning:  +LogCompilation must be set to enable compilation logging from
>> directives
>> Running test: Test_TempNewSymbol
>> Running test: VMStructs_test
>> ...
>>
>> We might want to get rid of the Warning messages, but I think the
>> proposed patch is a good first step.
>>
>> You can turn on the old output with -XX:+VerboseInternalVMTests.
>>
>> Tested with -XX:+ExecuteInternalVMTests :)
>>
>> Thanks,
>> StefanK


From zoltan.majo at oracle.com  Thu Apr 28 08:25:17 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 28 Apr 2016 10:25:17 +0200
Subject: RFR(S): 8154836: VM crash due to "Base pointers must match"
In-Reply-To: <a32cc4b4-db9b-53fb-2efa-91b6442b1b46@oracle.com>
References: <d3e6c01010164382a4e4edb22f8923b9@DEWDFE13DE14.global.corp.sap>
	<57201A20.7000504@oracle.com>
	<1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap>
	<a32cc4b4-db9b-53fb-2efa-91b6442b1b46@oracle.com>
Message-ID: <5721C8ED.8040507@oracle.com>

The latest patch looks good to me as well. I'll push it today.

Thank you!

Best regards,


Zoltan

On 04/27/2016 04:54 PM, Vladimir Kozlov wrote:
> This looks good.
>
> Thanks,
> Vladimir
>
> On 4/27/16 1:40 AM, Doerr, Martin wrote:
>> Hi Vladimir and Zoltan,
>>
>> thanks for reviewing.
>>
>> I have made the requested changes:
>> - Removed the code which skips the transformation on non-X86 
>> platforms in heap-based compressed oops mode. I think we can discuss 
>> that independently.
>> - Improved the assertion as suggested by Vladimir.
>>
>> New webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.01/ 
>>
>>
>> Zoltan, you have already offered to sponsor this change. Thanks.
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Mittwoch, 27. April 2016 03:47
>> To: Doerr, Martin <martin.doerr at sap.com>; Zolt?n Maj? 
>> <zoltan.majo at oracle.com>; hotspot-compiler-dev at openjdk.java.net 
>> compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match"
>>
>> Thank you, Martin
>>
>> On 4/25/16 4:04 AM, Doerr, Martin wrote:
>>> Hi all,
>>>
>>> we have already seen such an assertion in final_graph_reshaping.
>>>
>>> We found 2 AddP nodes in a row. The ideal graph looked like this 
>>> (simplified):
>>> N0 ConP
>>> N1 ConN
>>> N2 AddP(Base = N0, Address = N0)
>>> N3 AddP(Base = N0, Address = N2)
>>>
>>> Final graph reshaping visited N2 before N3 first and changed the graph:
>>> N0 ConP
>>> N1 ConN
>>> N4 DecodeN
>>> N2 AddP(Base = N4, Address = N4)
>>> N3 AddP(Base = N0, Address = N2)
>>>
>>> Afterwards, final graph reshaping visited N3 and ran into the 
>>> assertion. The Base of N3 is unexpected.
>>>
>>> I made a change to reconnect N3's Base input to N4, too.
>>>
>>> Webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/ 
>>>
>>
>> Fix looks good.
>>
>>>
>>> In addition to fixing this problem, I added an assertion to check if 
>>> there are more than 3 AddP nodes in a row.
>>> I wouldn't expect that to happen. Not sure if this assertion is 
>>> desired.
>>
>> We should not have 3rd AddP but I agree with assert. You should add 
>> additional check to the assert:
>>
>> || out_j->in(AddPNode::Base) != addp
>>
>>>
>>> I made an additional change: I think the graph transformation 
>>> doesn't make sense if decoding is expensive.
>>> Therefore, I skip it on non-X86 platforms when we're running in heap 
>>> based compressed oops mode.
>>> (I believe X86 is the only platform which can match the decoding in 
>>> the operand in this case.)
>>
>> It is not related to the fix and should be done separately. Or don't 
>> do at all.
>> There is comment above the code which explains why it could 
>> beneficial on SPARC too:
>>
>> 2845       // On sparc loading 32-bits constant and decoding it have 
>> less
>> 2846       // instructions (4) then load 64-bits constant (7).
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Please review. I will also need a sponsor, please.
>>>
>>> Best regards,
>>> Martin
>>>
>>>


From martin.doerr at sap.com  Thu Apr 28 09:40:02 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 28 Apr 2016 09:40:02 +0000
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
	<5720718B.6020102@redhat.com>
	<ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
	<6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com>
Message-ID: <c12dce50d0fc414ca29160db1d0c6837@DEWDFE13DE14.global.corp.sap>

Hi all,

thanks for all your comments.

we could introduce something like
x86:
bool Matcher::const_oop_prefer_decode() { return true; }
non-x86:
bool Matcher::const_oop_prefer_decode() { return Universe::narrow_oop_base() == NULL; }

all platforms:
bool Matcher::const_klass_prefer_decode() { return Universe::narrow_klass_base() == NULL; }
(Seems like matching of DecodeNKlass as operand is not implemented on x86.)

Roland, sorry that we discuss so much of my stuff in your thread for 8154826.
I think it is slightly related to your change and it touches the same files. So it might be easier to sell it as add-on to your change?

If you prefer to handle it separated from 8154826, please let me know.
I could open a new bug for it. Disadvantage would be that someone will have to take care of closed source platforms and I will need a sponsor from Oracle.

Best regards,
Martin

 
-----Original Message-----
From: Dean Long [mailto:dean.long at oracle.com] 
Sent: Mittwoch, 27. April 2016 21:18
To: Doerr, Martin <martin.doerr at sap.com>; rwestrel at redhat.com; Zolt?n Maj? <zoltan.majo at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode

On 4/27/2016 2:11 AM, Doerr, Martin wrote:
> Hi Roland,
>
> I have removed the piece of my change which would interfere with your change.
>
> But I'd like to explain the intention of skipping the transformation on non-x86 platforms in heap-based compressed oops mode.
> (In simpler compressed oops modes decoding is very cheap so the transformation is probably good.)
> Without transformation we have: LoadConP + Storage access
> With transformation we have: LoadConN + DecodeN heap-based + Storage access
>
> I believe X86 is the only platform which can match the DecodeN heap-based into the Storage access.
>
> I guess other platforms should prefer the untransformed version:
> PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded.
>
> I didn't take a closer look at SPARC, but I thought it would use the constant pool as well. Not sure if the following comment is still correct:
> // On sparc loading 32-bits constant and decoding it have less
> // instructions (4) then load 64-bits constant (7).
>
>
> Therefore, I had proposed the following code to skip:
> +         // Matching decode heap based into an operand only works on X86.
> + #if !defined(X86)
> +         if ((op == Op_ConN      && Universe::narrow_oop_base()   != NULL) ||
> +             (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) {
> +           break;
> +         }
> + #endif
> +
>
> Would this be good for aarch64 as well?
> Would you like to include code which skips the transformation in your change or should this better be discussed independently?

Martin, wouldn't your #ifdef X86 code be better as a Matcher function, 
similar to narrow_oop_use_complex_address()?

dl

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Roland Westrelin [mailto:rwestrel at redhat.com]
> Sent: Mittwoch, 27. April 2016 10:00
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode
>
> Hi Martin,
>
>> seems like this issue is related to what I have sent out today:
>> RFR(S): 8154836: VM crash due to "Base pointers must match"
>> I also had to change the AddP case of final graph reshaping.
>>
>> In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode.
>>
>> Maybe I have to remove that part of my change, or at least adapt it.
>> We should make sure that the changes don't get pushed on the same day.
> Thanks for the heads up. It looks like your change will get in before
> mine. I'll send an updated webrev once it's integrated.
>
> Roland.


From rwestrel at redhat.com  Thu Apr 28 09:50:06 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 28 Apr 2016 11:50:06 +0200
Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned
	offset
Message-ID: <5721DCCE.5050602@redhat.com>


http://cr.openjdk.java.net/~roland/8155612/webrev.00/

When the superword optimization encounters a misaligned offset, it tries
to align the loop induction variable so the memory access is aligned. So
even though the aarch64 port only supports align vector memory accesses
( Matcher::misaligned_vectors_ok() return false), vector node can still
get a misaligned offset which results in:

# Internal Error
(/scratch/rwestrel/hs-comp/hotspot/src/cpu/aarch64/vm/assembler_aarch64.hpp:251),
pid=3780, tid=3807
# guarantee(chk == -1 || chk == 0) failed: Field too big for insn

Roland.

From zoltan.majo at oracle.com  Thu Apr 28 09:50:31 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 28 Apr 2016 11:50:31 +0200
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com>
References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com>
	<571F5756.7020007@oracle.com>
	<0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com>
	<5720CA98.10605@oracle.com>
	<8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com>
Message-ID: <5721DCE7.1020102@oracle.com>

Hi Vladimir,


On 04/27/2016 11:55 PM, Vladimir Kozlov wrote:
> [...]
>> I agree. But I'd prefer we do that in the 
>> AllocatePrefetchStepSizeConstraintFunc() constraint function in
>> commandLineFlagConstraintsCompiler.cpp). The reason is that since 
>> JEP-245 the preferred place to validate of
>> command-line arguments is in constraint functions. Most 
>> compiler-related checks were moved there with JDK-8078554 and
>> JDK-8146478.
>
> Okay. Usually we do platform specific flag's setting in 
> vm_version_<arch>.cpp files but it looks like it start changing.
> I am not supporter of having multiply #ifdef in shared code (in 
> commandLineFlagConstraintsCompiler).

JEP-245 proposed to have ranges/constraints defined for all flags. Some 
of these ranges/constraints are unavoidably architecture-specific.

I agree with you that having lots of #ifdefs in shared code does not 
improve code readability. But I'm wondering what would be a good way to 
achieve the goals of JEP-245 without using too many #ifdefs. Maybe 
architecture-specific constraint files (e.g., 
commandLineFlagConstraintsCompiler_solaris.cpp, 
commandLineFlagConstraintsCompiler_x86.cpp, etc.)?

At least with the current change we won't add more #ifdefs...

>
>>
>> I'd also like to set the minimum value for AllocatePrefetchDistance 
>> to AllocatePrefetchStepSize, otherwise we can access
>> the heap before newly allocated objects/arrays.
>
> Only for style 3. And I said it may be better to change code.

Yes, you've mentioned that in the first round of reviews and I missed 
your point in the subsequent rounds. Sorry for that.

I've updated the code in macro.cpp and the constraint 
AllocatePrefetchDistanceConstraintFunc() as you've suggested.

> [...]
>>>
>>> Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - 
>>> it should be used only if UseTLAB is true. Please,
>>> look.
>>
>> It seems AllocatePrefetchStyle = 2 can be used only if UseTLAB is true.
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/6a17c49de974/src/share/vm/opto/macro.cpp#l1844 
>>
>
> I got report from it crashing on T7 (attached). May be it is because 
> by default it used BIS with it.

I think using BIS with AllocatePrefetchStyle=2 is indeed the cause.

I've executed the following on our T7 machine with b115:
* java -XX:AllocatePrefetchStyle=2 (uses AllocatePrefetchInstr=1 by 
default) -> crashes
* java -XX:AllocatePrefetchStyle=2 -XX:AllocatePrefetchInstr=0 -> works

With the newest webrev, both commands pass.

Here is the newest webrev:
http://cr.openjdk.java.net/~zmajo/8153340/webrev.03/

I re-did testing with JPRT, the results look OK.

Thank you!

Best regards,


Zoltan

>
> Thanks,
> Vladimir
>
>>
>> Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, 
>> I've started an RBT run with all hotspot on all
>> platforms using AllocatePrefetchStyle=2. So far no problems have 
>> shown up.
>>
>>> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch 
>>> should be set for all styles on all platforms to
>>> avoid accessing beyond heap. Prefetch instructions doc say that they 
>>> does not trap but we should be careful.
>>
>> I agree.
>>
>> That means we initialize _reserve_for_allocation_prefetch in a 
>> platform-independent way. So I think it would make sense
>> to move that field to ThreadLocalAllocBuffer, as TLAB is the only 
>> user of that field and we don't support
>> platform-independent initialization of Abstract_VM_Version. I did 
>> that in the updated webrev.
>>
>>>>
>>>> The updated webrev does the following (in addition to fixing the 
>>>> original problem with AllocatePrefetchDistance):
>>>>
>>>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 
>>>> (i.e., BIS instructions are used for prefetching). As
>>>> far as I understand, AllocatePrefetchStyle = 3 was added to support 
>>>> prefetching with BIS, so if BIS is enabled, we
>>>> should use AllocatePrefetchStyle = 3.
>>>
>>> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should 
>>> select AllocatePrefetchStyle = 3.
>>
>> OK.
>>
>>> But we should allow AllocatePrefetchStyle = 3  if normal prefetch 
>>> instructions (or other platforms) are used.
>>
>> OK, I've removed that restriction.
>>
>>> I think we should update comment in globals.hpp to say "generate one 
>>> prefetch per cache line" without saying BIS.
>>
>> OK.
>>
>>>
>>> But I agree if BIS is not available we should not use BIS 
>>> AllocatePrefetchInstr = 1.
>>
>> OK.
>>
>>>
>>>>
>>>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on 
>>>> non-SPARC platforms.
>>>
>>> It could be useful on other platforms since it does one access per 
>>> cache line.
>>
>> OK, let's keep it available then.
>>
>>>
>>>
>>>>
>>>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if 
>>>> AllocatePrefetchStyle is 3 (due to alignment requirements).
>>>
>>> That is correct since stxa requires at least 8 bytes alignment (as 
>>> stx).
>>
>> OK.
>>
>>>
>>>>
>>>> 3. Determine the number of lines to prefetch in the same way for 
>>>> all prefetch styles:
>>>> lines = (prefecth instance allocation) ? 
>>>> AllocateInstancePrefetchLines : AllocatePrefetchLines
>>>
>>> Agree.
>>
>> OK.
>>
>>>
>>>>
>>>> Here is the updated webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/
>>>
>>> vm_version_sparc.cpp
>>> AllocatePrefetchInstr = 0 should be set for all styles (not only 1) 
>>> when BIS is not available.
>>
>> OK. I think it is sufficient to do
>>
>> 52 if (!has_blk_init()) {
>> 53 if (AllocatePrefetchInstr == 1) {
>> 54 warning("BIS instructions required for AllocatePrefetchInstr 1 
>> unavailable");
>> 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0);
>> 56 } I hope I'm not missing anything here. Here is the updated webrev:
>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT 
>> (incl. TestOptionsWithRanges); - local testing on
>> SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all 
>> platforms. Thank you! Best regards, Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Testing:
>>>> - JPRT (incl. TestOptionsWithRanges)
>>>> - local testing on a SPARC machine.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> please review the patch for 8153340.
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153340
>>>>>>
>>>>>>
>>>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and 
>>>>>> AllocatePrefetchDistance==0. The crash happens due to the way
>>>>>> the address for the first prefetch instruction is calculated [1]:
>>>>>>
>>>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= 
>>>>>> ~(AllocatePrefetchStepSize - 1) which can zero some of
>>>>>> the bits of cache_adr. That result in accesses *before* the newly 
>>>>>> allocated object.
>>>>>>
>>>>>>
>>>>>> Solution: Set lower limit of AllocatePrefetchDistance to 
>>>>>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3).
>>>>>> Unquarantine test.
>>>>>>
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>>>>>>
>>>>>> Testing:
>>>>>> - JPRT (incl. TestOptionsWithRanges.java)
>>>>>> - local testing on a SPARC machine.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>> [1] 
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941
>>>>
>>


From aph at redhat.com  Thu Apr 28 09:58:59 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 28 Apr 2016 10:58:59 +0100
Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned
	offset
In-Reply-To: <5721DCCE.5050602@redhat.com>
References: <5721DCCE.5050602@redhat.com>
Message-ID: <5721DEE3.5010100@redhat.com>

On 28/04/16 10:50, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8155612/webrev.00/

OK, thanks.

Andrew.


From nils.eliasson at oracle.com  Thu Apr 28 12:01:19 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 28 Apr 2016 14:01:19 +0200
Subject: [9] RFR(S): 8142464: PlatformLoggerTest.java throws
	java.lang.RuntimeException: Logger test.logger.bar does not exist
Message-ID: <5721FB8F.40207@oracle.com>

Hi,

Please review the fix of 
jdk/test/sun/util/logging/PlatformLoggerTest.java that has been failing 
intermittently in our nightlies.

Summary:
The test uses loggers that are accessible by weak refs from a 
LogManager. Since the test doesn't keep a strong ref to the loggers they 
may get collected during the test.

Solution:
Save loggers to a static field  in the test class.

Other:
Also removing "@compile -XDignore.symbol.file" that is unnecessary after 
Jigsaw. The compile tag will force a compile even if the class already 
exists, wasting times during reruns.

Testing:
Running tests on all platforms.

Bug: https://bugs.openjdk.java.net/browse/JDK-8142464
webrev: http://cr.openjdk.java.net/~neliasso/8142464/webrev_jdk.01/ (JDK)

Regards,
Nils Eliasson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160428/94ba2fc2/attachment.html>

From stefan.karlsson at oracle.com  Thu Apr 28 12:11:06 2016
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 28 Apr 2016 14:11:06 +0200
Subject: RFR: 8155206: Internal VM test DirectiveParser_test is too verbose
In-Reply-To: <5721F893.9020801@oracle.com>
References: <5720761A.4010004@oracle.com> <5721F893.9020801@oracle.com>
Message-ID: <5721FDDA.8060804@oracle.com>

(Adding back hotspot-compiler-dev)

Thanks, Nils.

StefanK

On 2016-04-28 13:48, Nils Eliasson wrote:
> Hi Stefan,
>
> Looks good!
>
> Reviewed.
>
> Regards,
> Nils
>
> On 2016-04-27 10:19, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to silence the DirectiveParser_test internal 
>> VM test.
>>
>> http://cr.openjdk.java.net/~stefank/8155206/webrev.01
>> https://bugs.openjdk.java.net/browse/JDK-8155206
>>
>> Before the patch, we got the following output when running with 
>> -XX:+ExecuteInternalVMTests:
>>
>> ...
>> Running test: Test_log_file_startup_truncation
>> Running test: Test_invalid_log_file
>> Running test: DirectivesParser_test
>> Internal error on line 1 byte 2: Directive missing required match.
>>   Got EOS.
>>   At '}'.
>> {}
>> Internal error on line 1 byte 3: Directive missing required match.
>>   At '}]'.
>> [{}]
>> Internal error on line 1 byte 3: Directive missing required match.
>>   At '},{}]'.
>> [{},{}]
>> Internal error on line 1 byte 2: Directive missing required match.
>>   At '},{}'.
>> {},{}
>> Syntax error on line 2 byte 3: DirectivesParser can only start with 
>> an array containing directive objects, or one single directive.
>>   At '['.
>>   [
>>     {
>>       match: "foo/bar.*",
>>       inline : "+java/util.*",
>>       PrintAssembly: true,
>>       BreakAtExecute: true,
>>     }
>>   ]
>> ]
>>
>> Value error on line 4 byte 20: The key 'PrintInlining' does not allow 
>> an array of values.
>>   At '['.
>>     PrintInlining: [
>>       true,
>>       false
>>     ],
>>   }
>> ]
>>
>> Warning: +LogCompilation must be set to enable compilation logging 
>> from directives
>> Warning: +LogCompilation must be set to enable compilation logging 
>> from directives
>> Value error on line 7 byte 9: Method pattern error: Missing leading 
>> inline type (+/-)
>>   At '"foo",'.
>>         "foo",
>>         "bar",
>>       ]
>>     }
>>   }
>> ]
>>
>> Value error on line 8 byte 9: Method pattern error: Missing leading 
>> inline type (+/-)
>>   At '"bar",'.
>>         "bar",
>>       ]
>>     }
>>   }
>> ]
>>
>> Key error on line 1 byte 7: Key 'c1' not allowed after 'c1' key.
>>   At 'c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]'.
>> [{c1:{c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]
>> Value error on line 5 byte 12: Key of type match needs a value of 
>> type string
>>   At 'true,'.
>>     match: true,
>>     inline: true,
>>     enable: true,
>>     c1: {
>>       preset: true,
>>     }
>>   }
>> ]
>>
>> Running test: Test_TempNewSymbol
>> Running test: VMStructs_test
>> ...
>>
>> With the patch the output is much less noisy:
>> ...
>> Running test: Test_log_file_startup_truncation
>> Running test: Test_invalid_log_file
>> Running test: DirectivesParser_test
>> Warning:  +LogCompilation must be set to enable compilation logging 
>> from directives
>> Warning:  +LogCompilation must be set to enable compilation logging 
>> from directives
>> Running test: Test_TempNewSymbol
>> Running test: VMStructs_test
>> ...
>>
>> We might want to get rid of the Warning messages, but I think the 
>> proposed patch is a good first step.
>>
>> You can turn on the old output with -XX:+VerboseInternalVMTests.
>>
>> Tested with -XX:+ExecuteInternalVMTests :)
>>
>> Thanks,
>> StefanK
>


From rwestrel at redhat.com  Thu Apr 28 12:29:14 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 28 Apr 2016 14:29:14 +0200
Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned
	offset
In-Reply-To: <5721DEE3.5010100@redhat.com>
References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com>
Message-ID: <5722021A.8000306@redhat.com>

Thank you for the review, Andrew.
I need a sponsor because of the test case.

Roland.

From zoltan.majo at oracle.com  Thu Apr 28 13:04:58 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 28 Apr 2016 15:04:58 +0200
Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned
	offset
In-Reply-To: <5722021A.8000306@redhat.com>
References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com>
	<5722021A.8000306@redhat.com>
Message-ID: <57220A7A.80400@oracle.com>

Hi Roland,


On 04/28/2016 02:29 PM, Roland Westrelin wrote:
> Thank you for the review, Andrew.
> I need a sponsor because of the test case.

I can sponsor your change.

Best regards,


Zoltan

>
> Roland.


From adinn at redhat.com  Thu Apr 28 13:33:01 2016
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 28 Apr 2016 14:33:01 +0100
Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned
	offset
In-Reply-To: <5722021A.8000306@redhat.com>
References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com>
	<5722021A.8000306@redhat.com>
Message-ID: <5722110D.8040601@redhat.com>

On 28/04/16 13:29, Roland Westrelin wrote:
> Thank you for the review, Andrew.
> I need a sponsor because of the test case.

If you need a second AArch64 reviewer then the patch also has my
imprimatur (that's a yes :-).

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From edward.nevill at gmail.com  Thu Apr 28 13:49:48 2016
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 28 Apr 2016 14:49:48 +0100
Subject: RFR: 8155617: aarch64: ClearArray does not use DC ZVA
Message-ID: <1461851388.10531.27.camel@mint>

Hi,

Please review the following webrev

http://cr.openjdk.java.net/~enevill/8155617/bzero6

https://bugs.openjdk.java.net/browse/JDK-8155617

This is the bzero3 version previously discussed on the aarch64 list with the inner DC ZVA outlined. The outlining of the DC ZVA loop made no measurable difference to performance.

I have also tuned the BlockZeroingLowLimit to default to 4 x cache line size rather than always defaulting to 256.

Updated performance charts here:-

http://cr.openjdk.java.net/~enevill/8155617/bzero6.pdf

The chart show the performance improvement on 3 different partners HW.

The benchmark was the following JHM test provided by Andrew Haley

http://people.linaro.org/~edward.nevill/block_zero/ArrayFill.java

The charts have been normalised so that the original jdk9 hs-comp tree is shown as 100%. The figures are % of original performance so lower is better. This is done to avoid disclosing absolute performance information on partner's HW.

Orig: Original jdk9 hs-comp
bzero6: jdk9 hs-comp with bzero6
Orig (no prf): Original jdk9 hs-comp (-XX:AllocatePrefetchStyle=0)
bzero6 (no pref): jdk9 hs-comp with bzero6 (-XX:AllocatePrefetchStyle=0)

There is significant interaction between prefetch and block zeroing as discussed previously. Some partners benefit from prefetch, others do not.

The proposed patch does not change the behaviour of prefetch (ie. it leaves it enabled) as I think there should be a separate tuning exercise to tune prefetch for different partners HW.

OK to push?

Ed.


From aph at redhat.com  Thu Apr 28 14:06:36 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 28 Apr 2016 15:06:36 +0100
Subject: RFR: 8155617: aarch64: ClearArray does not use DC ZVA
In-Reply-To: <1461851388.10531.27.camel@mint>
References: <1461851388.10531.27.camel@mint>
Message-ID: <572218EC.1010201@redhat.com>

On 04/28/2016 02:49 PM, Edward Nevill wrote:
> The proposed patch does not change the behaviour of prefetch (ie. it leaves it enabled) as I think there should be a separate tuning exercise to tune prefetch for different partners HW.

OK, but as this is a severe performance regression for Partner X I
expect the prefetch-tuning patch very soon.

Andrew.

From dmitrij.pochepko at oracle.com  Thu Apr 28 14:28:33 2016
From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko)
Date: Thu, 28 Apr 2016 17:28:33 +0300
Subject: RFR(XS): 8155163 - JVMCI:
	MethodHandleAccessProvider.resolveInvokeBasicTarget implementation
	doesn't match javadoc
Message-ID: <57221E11.4000406@oracle.com>

Hi,

please review fix for JDK-8155163 - JVMCI: 
MethodHandleAccessProvider.resolveInvokeBasicTarget implementation 
doesn't match javadoc

MethodHandleAccessProvider.resolveInvokeBasicTarget implementation 
throws NPE in case 1st parameter doesn't represent MethodHandle, but 
according to javadoc it should return null in such case. So, a small fix 
should be applied to match javadoc.

CR: https://bugs.openjdk.java.net/browse/JDK-8155163
Webrev: http://cr.openjdk.java.net/~dpochepk/8155163/webrev.01/

I've tested this change on linux_x64 via jprt.

Thanks,
Dmitrij
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160428/1d1ff6b0/attachment.html>

From rwestrel at redhat.com  Thu Apr 28 14:33:15 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 28 Apr 2016 16:33:15 +0200
Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned
	offset
In-Reply-To: <57220A7A.80400@oracle.com>
References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com>
	<5722021A.8000306@redhat.com> <57220A7A.80400@oracle.com>
Message-ID: <57221F2B.50606@redhat.com>

Thanks for offering to sponsor, Zoltan and thanks for the second review,
Andrew D.!

Roland.

From zoltan.majo at oracle.com  Thu Apr 28 14:41:37 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 28 Apr 2016 16:41:37 +0200
Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned
	offset
In-Reply-To: <57221F2B.50606@redhat.com>
References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com>
	<5722021A.8000306@redhat.com> <57220A7A.80400@oracle.com>
	<57221F2B.50606@redhat.com>
Message-ID: <57222121.9030501@oracle.com>

Hi,


On 04/28/2016 04:33 PM, Roland Westrelin wrote:
> Thanks for offering to sponsor, Zoltan and thanks for the second review,
> Andrew D.!

@Roland: you are welcome!

@Andrew: Unfortunately, I started the push job without mentioning you in 
the description. Once I've seen your review, it was already too late. 
Sorry for that.

Best regards,


Zoltan

>
> Roland.


From adinn at redhat.com  Thu Apr 28 14:53:20 2016
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 28 Apr 2016 15:53:20 +0100
Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned
	offset
In-Reply-To: <57222121.9030501@oracle.com>
References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com>
	<5722021A.8000306@redhat.com> <57220A7A.80400@oracle.com>
	<57221F2B.50606@redhat.com> <57222121.9030501@oracle.com>
Message-ID: <572223E0.5080805@redhat.com>

On 28/04/16 15:41, Zolt?n Maj? wrote:
> Hi,
> 
> 
> On 04/28/2016 04:33 PM, Roland Westrelin wrote:
>> Thanks for offering to sponsor, Zoltan and thanks for the second review,
>> Andrew D.!
> 
> @Roland: you are welcome!
> 
> @Andrew: Unfortunately, I started the push job without mentioning you in
> the description. Once I've seen your review, it was already too late.
> Sorry for that.

No problem. I don't think anyone is worried about keeping a precise
score :-)

regards,


Andrew Dinn
-----------


From rwestrel at redhat.com  Thu Apr 28 14:55:26 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 28 Apr 2016 16:55:26 +0200
Subject: SuperWord::unrolling_analysis() question
In-Reply-To: <C568518E7B433348B114B6A7122D474756E659CC@FMSMSX102.amr.corp.intel.com>
References: <dk67ffko335.fsf@rwestrel.remote.csb>
	<C568518E7B433348B114B6A7122D474756E652F1@FMSMSX102.amr.corp.intel.com>
	<5720E060.2050308@redhat.com>
	<47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com>
	<C568518E7B433348B114B6A7122D474756E658E3@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E659CC@FMSMSX102.amr.corp.intel.com>
Message-ID: <5722245E.7090202@redhat.com>

Hi Michael,

> Roland, for superword.cpp you only need this one line as change,
> which I have tested and for which has no negative side effects on
> x86.  It will address the issue(oldly enough, its where we started):
> 
> Line 201   int max_vector = Matcher::max_vector_size(T_BYTE);

Thanks for experimenting with this. That looks fine to me. I can make
that change as part of the patch that enables superword unrolling
analysis on aarch64 unless you'd like to do it as a separate change?

Roland.

From rwestrel at redhat.com  Thu Apr 28 15:43:03 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 28 Apr 2016 17:43:03 +0200
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <c12dce50d0fc414ca29160db1d0c6837@DEWDFE13DE14.global.corp.sap>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
	<5720718B.6020102@redhat.com>
	<ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
	<6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com>
	<c12dce50d0fc414ca29160db1d0c6837@DEWDFE13DE14.global.corp.sap>
Message-ID: <57222F87.3040602@redhat.com>


Hi Martin,

> Roland, sorry that we discuss so much of my stuff in your thread for
> 8154826. I think it is slightly related to your change and it touches
> the same files. So it might be easier to sell it as add-on to your
> change?
> 
> If you prefer to handle it separated from 8154826, please let me
> know. I could open a new bug for it. Disadvantage would be that
> someone will have to take care of closed source platforms and I will
> need a sponsor from Oracle.

I would say it would be cleaner to keep the 2 changes separated (a
change that affects all platforms hidden in a platform specific one
might be confusing to someone looking at the history of changes). But I
don't really have a strong opinion either so I'll let those who would do
the extra sponsoring work decide. Vladimir, what do you think?

Roland.

From tom.rodriguez at oracle.com  Thu Apr 28 16:13:50 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 28 Apr 2016 09:13:50 -0700
Subject: RFR 8155047: [JVMCI] findLeafConcreteSubtype should handle arrays of
	leaf concrete subtype
Message-ID: <3B2FE4AA-668D-4929-9FF4-EC65A11700A8@oracle.com>

http://cr.openjdk.java.net/~never/8155047/webrev

findLeafConcreteSubtype should use the same machinery for the elemental type when identifying leaf array types.

tom

From tom.rodriguez at oracle.com  Thu Apr 28 16:26:43 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 28 Apr 2016 09:26:43 -0700
Subject: RFR 8154483: update IGV with improvements from Graal
Message-ID: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com>

http://cr.openjdk.java.net/~never/8154483/webrev

This is a collection of small improvements to IGV developed while working on Graal.  I?ve categorized them below

Bug fixes:
  Reset folder in top component to release reference to old graphs 
  Fix concurrent modification exception in IGV 
  Fix HTML quoting in tooltips and remove useless entries from search box 

UI improvements:
  Fixed keybinding for open and save actions in IGV 
  Allow importing multiple files at once in IGV 
  Make node searches look through the graph history for a match 

Performance related improvements:
  Add info message about time spent parsing files 
  Relax expensive assert in IGV 
  Reduce overhead of hash computation for graph identity checks 
  Increase Integer cache size in IGV 
  Share properties in IGV 

Binary graph format:
  Add folder and graph property support to binary graphs in IGV 
  Handle Strings larger than buffer size properly in IGV

From rwestrel at redhat.com  Thu Apr 28 16:35:30 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 28 Apr 2016 18:35:30 +0200
Subject: RFR(S): 8154943: AArch64: redundant address computation instructions
	with vectorization
Message-ID: <dk64malofn1.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8154943/webrev.00/

Example generated code: 

 ;; B106: #	B106 B107 <- B105 B106 Loop: B106-B106 inner main of N612 Freq: 293550 

  0x000003ffa9275650: sbfiz	x0, x4, #3, #32 
  0x000003ffa9275654: add	x3, x5, x0 
  0x000003ffa9275658: sbfiz	x7, x4, #3, #32 
  0x000003ffa927565c: ldr	q19, [x3,#16] 
  0x000003ffa9275660: add	x19, x5, x7 
  0x000003ffa9275664: ldr	q20, [x19,#32] ;*daload {reexecute=0 rethrow=0 return_oop=0} 
                                                ; - jnt.scimark2.LU::factor at 237 (line 235) 

both sbfiz and add are strictly indentical and so fully redundant. The
redundant instructions are caused by 2 ConvI2L nodes that are identical
except for their type:

1984 ConvI2L _ 1683 [ 1987 ] debug_idx = 33701984 type = long: dump_spec = #long:minint..maxint-1:www 
1678 ConvI2L _ 1683 [ 1672 ] bci = 49 debug_orig = dump_spec = #long:0..maxint-1:www debug_idx = 30401678 line = 183 type = long: 

The type doesn't help code generation on arm so it could be widened and
duplicated instructions could then be removed. I think the same could be
done on sparc.

Roland.

From dmitrij.pochepko at oracle.com  Thu Apr 28 16:39:30 2016
From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko)
Date: Thu, 28 Apr 2016 19:39:30 +0300
Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant
	javadoc should be updated for null JavaKind case
Message-ID: <57223CC2.6000309@oracle.com>

Hi,

please review changes for 8155244 - JVMCI: 
MemoryAccessProvider.readUnsafeConstant javadoc should be updated for 
null JavaKind case

MemoryAccessProvider.readUnsafeConstant method implementation throws 
NullPointerException in case JavaKind is null. Such behavior wasn't 
specified in javadoc.
So, a small fix contains logic to throw IllegalArgumentException and 
respective javadoc update.

CR: https://bugs.openjdk.java.net/browse/JDK-8155244
webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/

I've tested fix on linux_x64 via jprt.

Thanks,
Dmitrij

From zoltan.majo at oracle.com  Thu Apr 28 17:44:48 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 28 Apr 2016 19:44:48 +0200
Subject: [9] RFR(XS): 8155653: TestVectorUnalignedOffset.java not pushed with
	8155612
Message-ID: <57224C10.9060704@oracle.com>

Hi,


The test TestVectorUnalignedOffset.java was originally included (and 
reviewed) with JDK-8155612:
http://cr.openjdk.java.net/~roland/8155612/webrev.00/

I sponsored the changeset but, by accident, did not push the test. So 
I've filed the current bug to address that problem.

If there are no objections, I'll push the test separately tomorrow (with 
Roland as a contributor). Here is the webrev:
http://cr.openjdk.java.net/~zmajo/8155653/webrev.00/

Sorry for the noise.

Best regards,


Zoltan


From michael.c.berg at intel.com  Thu Apr 28 18:09:44 2016
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 28 Apr 2016 18:09:44 +0000
Subject: SuperWord::unrolling_analysis() question
In-Reply-To: <5722245E.7090202@redhat.com>
References: <dk67ffko335.fsf@rwestrel.remote.csb>
	<C568518E7B433348B114B6A7122D474756E652F1@FMSMSX102.amr.corp.intel.com>
	<5720E060.2050308@redhat.com>
	<47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com>
	<C568518E7B433348B114B6A7122D474756E658E3@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474756E659CC@FMSMSX102.amr.corp.intel.com>
	<5722245E.7090202@redhat.com>
Message-ID: <C568518E7B433348B114B6A7122D474756E65D23@FMSMSX102.amr.corp.intel.com>

Ok, Roland that would be great, please do add it to your next changeset.

Regards,
Michael

-----Original Message-----
From: Roland Westrelin [mailto:rwestrel at redhat.com] 
Sent: Thursday, April 28, 2016 7:55 AM
To: Berg, Michael C <michael.c.berg at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: SuperWord::unrolling_analysis() question

Hi Michael,

> Roland, for superword.cpp you only need this one line as change, which 
> I have tested and for which has no negative side effects on x86.  It 
> will address the issue(oldly enough, its where we started):
> 
> Line 201   int max_vector = Matcher::max_vector_size(T_BYTE);

Thanks for experimenting with this. That looks fine to me. I can make that change as part of the patch that enables superword unrolling analysis on aarch64 unless you'd like to do it as a separate change?

Roland.

From christian.thalinger at oracle.com  Thu Apr 28 19:03:34 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 28 Apr 2016 09:03:34 -1000
Subject: RFR(XS): 8155163 - JVMCI:
	MethodHandleAccessProvider.resolveInvokeBasicTarget
	implementation doesn't match javadoc
In-Reply-To: <57221E11.4000406@oracle.com>
References: <57221E11.4000406@oracle.com>
Message-ID: <CCF91280-E377-4852-B49D-E60DBFE476D3@oracle.com>

Looks good.

> On Apr 28, 2016, at 4:28 AM, Dmitrij Pochepko <dmitrij.pochepko at oracle.com> wrote:
> 
> Hi,
> 
> please review fix for JDK-8155163 - JVMCI: MethodHandleAccessProvider.resolveInvokeBasicTarget implementation doesn't match javadoc
> 
> MethodHandleAccessProvider.resolveInvokeBasicTarget implementation throws NPE in case 1st parameter doesn't represent MethodHandle, but according to javadoc it should return null in such case. So, a small fix should be applied to match javadoc.
> 
> CR: https://bugs.openjdk.java.net/browse/JDK-8155163 <https://bugs.openjdk.java.net/browse/JDK-8155163>
> Webrev: http://cr.openjdk.java.net/~dpochepk/8155163/webrev.01/ <http://cr.openjdk.java.net/~dpochepk/8155163/webrev.01/>
> 
> I've tested this change on linux_x64 via jprt.
> 
> Thanks,
> Dmitrij

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160428/22e87449/attachment.html>

From vladimir.kozlov at oracle.com  Thu Apr 28 19:14:46 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 12:14:46 -0700
Subject: [9] RFR(XS): 8155653: TestVectorUnalignedOffset.java not pushed
	with 8155612
In-Reply-To: <57224C10.9060704@oracle.com>
References: <57224C10.9060704@oracle.com>
Message-ID: <57226126.8010708@oracle.com>

Good.

Thanks,
Vladimir

On 4/28/16 10:44 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> The test TestVectorUnalignedOffset.java was originally included (and reviewed) with JDK-8155612:
> http://cr.openjdk.java.net/~roland/8155612/webrev.00/
>
> I sponsored the changeset but, by accident, did not push the test. So I've filed the current bug to address that problem.
>
> If there are no objections, I'll push the test separately tomorrow (with Roland as a contributor). Here is the webrev:
> http://cr.openjdk.java.net/~zmajo/8155653/webrev.00/
>
> Sorry for the noise.
>
> Best regards,
>
>
> Zoltan
>

From vladimir.kozlov at oracle.com  Thu Apr 28 19:16:29 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 12:16:29 -0700
Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant
	javadoc should be updated for null JavaKind case
In-Reply-To: <57223CC2.6000309@oracle.com>
References: <57223CC2.6000309@oracle.com>
Message-ID: <5722618D.7010801@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/28/16 9:39 AM, Dmitrij Pochepko wrote:
> Hi,
>
> please review changes for 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case
>
> MemoryAccessProvider.readUnsafeConstant method implementation throws NullPointerException in case JavaKind is null. Such behavior wasn't specified in javadoc.
> So, a small fix contains logic to throw IllegalArgumentException and respective javadoc update.
>
> CR: https://bugs.openjdk.java.net/browse/JDK-8155244
> webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/
>
> I've tested fix on linux_x64 via jprt.
>
> Thanks,
> Dmitrij

From christian.thalinger at oracle.com  Thu Apr 28 19:23:16 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 28 Apr 2016 09:23:16 -1000
Subject: RFR 8155047: [JVMCI] findLeafConcreteSubtype should handle arrays
	of leaf concrete subtype
In-Reply-To: <3B2FE4AA-668D-4929-9FF4-EC65A11700A8@oracle.com>
References: <3B2FE4AA-668D-4929-9FF4-EC65A11700A8@oracle.com>
Message-ID: <6D4CBD60-A855-44A6-9C27-04D2012D050D@oracle.com>


> On Apr 28, 2016, at 6:13 AM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~never/8155047/webrev

         public LeafType(ResolvedJavaType context) {
+            assert !context.isLeaf() : "assumption isn't required for leaf types";
This assert is confusing.  The assumption is that a given type has no subtypes, which is also true for leaf types.  Does this assert make sure it?s not used in the wrong places?

> 
> findLeafConcreteSubtype should use the same machinery for the elemental type when identifying leaf array types.
> 
> tom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160428/b16ba00d/attachment.html>

From christian.thalinger at oracle.com  Thu Apr 28 19:25:25 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 28 Apr 2016 09:25:25 -1000
Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant
	javadoc should be updated for null JavaKind case
In-Reply-To: <57223CC2.6000309@oracle.com>
References: <57223CC2.6000309@oracle.com>
Message-ID: <1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com>

+     * @throws IllegalArgumentException if {@code kind} is {@code null}, {@code kind} is {@link JavaKind#Void} or
+     *             not {@linkplain JavaKind#isPrimitive() primitive} kind
I don?t think you have to repeat {@code kind}.

> On Apr 28, 2016, at 6:39 AM, Dmitrij Pochepko <dmitrij.pochepko at oracle.com> wrote:
> 
> Hi,
> 
> please review changes for 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case
> 
> MemoryAccessProvider.readUnsafeConstant method implementation throws NullPointerException in case JavaKind is null. Such behavior wasn't specified in javadoc.
> So, a small fix contains logic to throw IllegalArgumentException and respective javadoc update.
> 
> CR: https://bugs.openjdk.java.net/browse/JDK-8155244
> webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/
> 
> I've tested fix on linux_x64 via jprt.
> 
> Thanks,
> Dmitrij

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160428/ad22a254/attachment-0001.html>

From vladimir.kozlov at oracle.com  Thu Apr 28 19:43:17 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 12:43:17 -0700
Subject: RFR(S): 8154943: AArch64: redundant address computation
	instructions with vectorization
In-Reply-To: <dk64malofn1.fsf@rwestrel.remote.csb>
References: <dk64malofn1.fsf@rwestrel.remote.csb>
Message-ID: <572267D5.8020605@oracle.com>

node.cpp change is good.

compile.cpp I understand when you replace "similar" (same in(1)) node with it but it is not clear that you also processing users (whole following chain) to remove similar nodes. Add comment.
I think the check "!(k->Opcode() == Op_ConvI2L || ... " (and use 'continue' instead of 'break') should be done when you push a node on the list wq.push(u).

Thanks,
Vladimir

On 4/28/16 9:35 AM, Roland Westrelin wrote:
>
> http://cr.openjdk.java.net/~roland/8154943/webrev.00/
>
> Example generated code:
>
>   ;; B106: #	B106 B107 <- B105 B106 Loop: B106-B106 inner main of N612 Freq: 293550
>
>    0x000003ffa9275650: sbfiz	x0, x4, #3, #32
>    0x000003ffa9275654: add	x3, x5, x0
>    0x000003ffa9275658: sbfiz	x7, x4, #3, #32
>    0x000003ffa927565c: ldr	q19, [x3,#16]
>    0x000003ffa9275660: add	x19, x5, x7
>    0x000003ffa9275664: ldr	q20, [x19,#32] ;*daload {reexecute=0 rethrow=0 return_oop=0}
>                                                  ; - jnt.scimark2.LU::factor at 237 (line 235)
>
> both sbfiz and add are strictly indentical and so fully redundant. The
> redundant instructions are caused by 2 ConvI2L nodes that are identical
> except for their type:
>
> 1984 ConvI2L _ 1683 [ 1987 ] debug_idx = 33701984 type = long: dump_spec = #long:minint..maxint-1:www
> 1678 ConvI2L _ 1683 [ 1672 ] bci = 49 debug_orig = dump_spec = #long:0..maxint-1:www debug_idx = 30401678 line = 183 type = long:
>
> The type doesn't help code generation on arm so it could be widened and
> duplicated instructions could then be removed. I think the same could be
> done on sparc.
>
> Roland.
>

From dean.long at oracle.com  Thu Apr 28 20:24:25 2016
From: dean.long at oracle.com (Dean Long)
Date: Thu, 28 Apr 2016 13:24:25 -0700
Subject: RFR(S): 8154943: AArch64: redundant address computation
	instructions with vectorization
In-Reply-To: <dk64malofn1.fsf@rwestrel.remote.csb>
References: <dk64malofn1.fsf@rwestrel.remote.csb>
Message-ID: <b61e768f-03d8-d3cc-de90-92a50e3cabf2@oracle.com>

Are there any platforms where this won't help?

How about adding a Matcher function instead of a cpu-specific ifdef.

dl


On 4/28/2016 9:35 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8154943/webrev.00/
>
> Example generated code:
>
>   ;; B106: #	B106 B107 <- B105 B106 Loop: B106-B106 inner main of N612 Freq: 293550
>
>    0x000003ffa9275650: sbfiz	x0, x4, #3, #32
>    0x000003ffa9275654: add	x3, x5, x0
>    0x000003ffa9275658: sbfiz	x7, x4, #3, #32
>    0x000003ffa927565c: ldr	q19, [x3,#16]
>    0x000003ffa9275660: add	x19, x5, x7
>    0x000003ffa9275664: ldr	q20, [x19,#32] ;*daload {reexecute=0 rethrow=0 return_oop=0}
>                                                  ; - jnt.scimark2.LU::factor at 237 (line 235)
>
> both sbfiz and add are strictly indentical and so fully redundant. The
> redundant instructions are caused by 2 ConvI2L nodes that are identical
> except for their type:
>
> 1984 ConvI2L _ 1683 [ 1987 ] debug_idx = 33701984 type = long: dump_spec = #long:minint..maxint-1:www
> 1678 ConvI2L _ 1683 [ 1672 ] bci = 49 debug_orig = dump_spec = #long:0..maxint-1:www debug_idx = 30401678 line = 183 type = long:
>
> The type doesn't help code generation on arm so it could be widened and
> duplicated instructions could then be removed. I think the same could be
> done on sparc.
>
> Roland.


From vladimir.kozlov at oracle.com  Thu Apr 28 22:50:17 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 15:50:17 -0700
Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted
	offset addressing mode
In-Reply-To: <57222F87.3040602@redhat.com>
References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com>
	<53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap>
	<5720718B.6020102@redhat.com>
	<ac314003e0ac40d8b2cab337dc821748@DEWDFE13DE14.global.corp.sap>
	<6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com>
	<c12dce50d0fc414ca29160db1d0c6837@DEWDFE13DE14.global.corp.sap>
	<57222F87.3040602@redhat.com>
Message-ID: <572293A9.5090702@oracle.com>

Do it separately, please.

Thanks,
Vladimir

On 4/28/16 8:43 AM, Roland Westrelin wrote:
>
> Hi Martin,
>
>> Roland, sorry that we discuss so much of my stuff in your thread for
>> 8154826. I think it is slightly related to your change and it touches
>> the same files. So it might be easier to sell it as add-on to your
>> change?
>>
>> If you prefer to handle it separated from 8154826, please let me
>> know. I could open a new bug for it. Disadvantage would be that
>> someone will have to take care of closed source platforms and I will
>> need a sponsor from Oracle.
>
> I would say it would be cleaner to keep the 2 changes separated (a
> change that affects all platforms hidden in a platform specific one
> might be confusing to someone looking at the history of changes). But I
> don't really have a strong opinion either so I'll let those who would do
> the extra sponsoring work decide. Vladimir, what do you think?
>
> Roland.
>

From vladimir.kozlov at oracle.com  Thu Apr 28 22:55:25 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 15:55:25 -0700
Subject: RFR 8154483: update IGV with improvements from Graal
In-Reply-To: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com>
References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com>
Message-ID: <572294DD.40601@oracle.com>

Looks good.

In serialization/BinaryParser.java Copyright year change is wrong.

Thanks,
Vladimir

On 4/28/16 9:26 AM, Tom Rodriguez wrote:
> http://cr.openjdk.java.net/~never/8154483/webrev
>
> This is a collection of small improvements to IGV developed while working on Graal.  I?ve categorized them below
>
> Bug fixes:
>    Reset folder in top component to release reference to old graphs
>    Fix concurrent modification exception in IGV
>    Fix HTML quoting in tooltips and remove useless entries from search box
>
> UI improvements:
>    Fixed keybinding for open and save actions in IGV
>    Allow importing multiple files at once in IGV
>    Make node searches look through the graph history for a match
>
> Performance related improvements:
>    Add info message about time spent parsing files
>    Relax expensive assert in IGV
>    Reduce overhead of hash computation for graph identity checks
>    Increase Integer cache size in IGV
>    Share properties in IGV
>
> Binary graph format:
>    Add folder and graph property support to binary graphs in IGV
>    Handle Strings larger than buffer size properly in IGV
>

From vladimir.kozlov at oracle.com  Thu Apr 28 23:29:26 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 16:29:26 -0700
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <5721DCE7.1020102@oracle.com>
References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com>
	<571F5756.7020007@oracle.com>
	<0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com>
	<5720CA98.10605@oracle.com>
	<8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com>
	<5721DCE7.1020102@oracle.com>
Message-ID: <57229CD6.9050003@oracle.com>

On 4/28/16 2:50 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> On 04/27/2016 11:55 PM, Vladimir Kozlov wrote:
>> [...]
>>> I agree. But I'd prefer we do that in the AllocatePrefetchStepSizeConstraintFunc() constraint function in
>>> commandLineFlagConstraintsCompiler.cpp). The reason is that since JEP-245 the preferred place to validate of
>>> command-line arguments is in constraint functions. Most compiler-related checks were moved there with JDK-8078554 and
>>> JDK-8146478.
>>
>> Okay. Usually we do platform specific flag's setting in vm_version_<arch>.cpp files but it looks like it start changing.
>> I am not supporter of having multiply #ifdef in shared code (in commandLineFlagConstraintsCompiler).
>
> JEP-245 proposed to have ranges/constraints defined for all flags. Some of these ranges/constraints are unavoidably architecture-specific.
>
> I agree with you that having lots of #ifdefs in shared code does not improve code readability. But I'm wondering what would be a good way to achieve the goals of JEP-245 without using too many
> #ifdefs. Maybe architecture-specific constraint files (e.g., commandLineFlagConstraintsCompiler_solaris.cpp, commandLineFlagConstraintsCompiler_x86.cpp, etc.)?

Yes, that will work.

>
> At least with the current change we won't add more #ifdefs...
>
>>
>>>
>>> I'd also like to set the minimum value for AllocatePrefetchDistance to AllocatePrefetchStepSize, otherwise we can access
>>> the heap before newly allocated objects/arrays.
>>
>> Only for style 3. And I said it may be better to change code.
>
> Yes, you've mentioned that in the first round of reviews and I missed your point in the subsequent rounds. Sorry for that.
>
> I've updated the code in macro.cpp and the constraint AllocatePrefetchDistanceConstraintFunc() as you've suggested.
>
>> [...]
>>>>
>>>> Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - it should be used only if UseTLAB is true. Please,
>>>> look.
>>>
>>> It seems AllocatePrefetchStyle = 2 can be used only if UseTLAB is true.
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/6a17c49de974/src/share/vm/opto/macro.cpp#l1844
>>
>> I got report from it crashing on T7 (attached). May be it is because by default it used BIS with it.
>
> I think using BIS with AllocatePrefetchStyle=2 is indeed the cause.
>
> I've executed the following on our T7 machine with b115:
> * java -XX:AllocatePrefetchStyle=2 (uses AllocatePrefetchInstr=1 by default) -> crashes
> * java -XX:AllocatePrefetchStyle=2 -XX:AllocatePrefetchInstr=0 -> works
>
> With the newest webrev, both commands pass.
>
> Here is the newest webrev:
> http://cr.openjdk.java.net/~zmajo/8153340/webrev.03/

This looks good.

Thanks,
Vladimir

>
> I re-did testing with JPRT, the results look OK.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, I've started an RBT run with all hotspot on all
>>> platforms using AllocatePrefetchStyle=2. So far no problems have shown up.
>>>
>>>> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch should be set for all styles on all platforms to
>>>> avoid accessing beyond heap. Prefetch instructions doc say that they does not trap but we should be careful.
>>>
>>> I agree.
>>>
>>> That means we initialize _reserve_for_allocation_prefetch in a platform-independent way. So I think it would make sense
>>> to move that field to ThreadLocalAllocBuffer, as TLAB is the only user of that field and we don't support
>>> platform-independent initialization of Abstract_VM_Version. I did that in the updated webrev.
>>>
>>>>>
>>>>> The updated webrev does the following (in addition to fixing the original problem with AllocatePrefetchDistance):
>>>>>
>>>>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 (i.e., BIS instructions are used for prefetching). As
>>>>> far as I understand, AllocatePrefetchStyle = 3 was added to support prefetching with BIS, so if BIS is enabled, we
>>>>> should use AllocatePrefetchStyle = 3.
>>>>
>>>> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should select AllocatePrefetchStyle = 3.
>>>
>>> OK.
>>>
>>>> But we should allow AllocatePrefetchStyle = 3  if normal prefetch instructions (or other platforms) are used.
>>>
>>> OK, I've removed that restriction.
>>>
>>>> I think we should update comment in globals.hpp to say "generate one prefetch per cache line" without saying BIS.
>>>
>>> OK.
>>>
>>>>
>>>> But I agree if BIS is not available we should not use BIS AllocatePrefetchInstr = 1.
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on non-SPARC platforms.
>>>>
>>>> It could be useful on other platforms since it does one access per cache line.
>>>
>>> OK, let's keep it available then.
>>>
>>>>
>>>>
>>>>>
>>>>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if AllocatePrefetchStyle is 3 (due to alignment requirements).
>>>>
>>>> That is correct since stxa requires at least 8 bytes alignment (as stx).
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>> 3. Determine the number of lines to prefetch in the same way for all prefetch styles:
>>>>> lines = (prefecth instance allocation) ? AllocateInstancePrefetchLines : AllocatePrefetchLines
>>>>
>>>> Agree.
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>> Here is the updated webrev:
>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/
>>>>
>>>> vm_version_sparc.cpp
>>>> AllocatePrefetchInstr = 0 should be set for all styles (not only 1) when BIS is not available.
>>>
>>> OK. I think it is sufficient to do
>>>
>>> 52 if (!has_blk_init()) {
>>> 53 if (AllocatePrefetchInstr == 1) {
>>> 54 warning("BIS instructions required for AllocatePrefetchInstr 1 unavailable");
>>> 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0);
>>> 56 } I hope I'm not missing anything here. Here is the updated webrev:
>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT (incl. TestOptionsWithRanges); - local testing on
>>> SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all platforms. Thank you! Best regards, Zoltan
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> Testing:
>>>>> - JPRT (incl. TestOptionsWithRanges)
>>>>> - local testing on a SPARC machine.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>> please review the patch for 8153340.
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153340
>>>>>>>
>>>>>>>
>>>>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way
>>>>>>> the address for the first prefetch instruction is calculated [1]:
>>>>>>>
>>>>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of
>>>>>>> the bits of cache_adr. That result in accesses *before* the newly allocated object.
>>>>>>>
>>>>>>>
>>>>>>> Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3).
>>>>>>> Unquarantine test.
>>>>>>>
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>>>>>>>
>>>>>>> Testing:
>>>>>>> - JPRT (incl. TestOptionsWithRanges.java)
>>>>>>> - local testing on a SPARC machine.
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>>
>>>>>>> Zoltan
>>>>>>>
>>>>>>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941
>>>>>
>>>
>

From vladimir.kozlov at oracle.com  Thu Apr 28 23:54:32 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 16:54:32 -0700
Subject: [9] RFR(S): 8142464: PlatformLoggerTest.java throws
	java.lang.RuntimeException: Logger test.logger.bar does not exist
In-Reply-To: <5721FB8F.40207@oracle.com>
References: <5721FB8F.40207@oracle.com>
Message-ID: <5722A2B8.7000100@oracle.com>

Good.

thanks,
Vladimir

On 4/28/16 5:01 AM, Nils Eliasson wrote:
> Hi,
>
> Please review the fix of jdk/test/sun/util/logging/PlatformLoggerTest.java that has been failing intermittently in our nightlies.
>
> Summary:
> The test uses loggers that are accessible by weak refs from a LogManager. Since the test doesn't keep a strong ref to the loggers they may get collected during the test.
>
> Solution:
> Save loggers to a static field  in the test class.
>
> Other:
> Also removing "@compile -XDignore.symbol.file" that is unnecessary after Jigsaw. The compile tag will force a compile even if the class already exists, wasting times during reruns.
>
> Testing:
> Running tests on all platforms.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8142464
> webrev: http://cr.openjdk.java.net/~neliasso/8142464/webrev_jdk.01/ (JDK)
>
> Regards,
> Nils Eliasson

From tom.rodriguez at oracle.com  Thu Apr 28 23:55:49 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 28 Apr 2016 16:55:49 -0700
Subject: RFR 8155047: [JVMCI] findLeafConcreteSubtype should handle arrays
	of leaf concrete subtype
In-Reply-To: <6D4CBD60-A855-44A6-9C27-04D2012D050D@oracle.com>
References: <3B2FE4AA-668D-4929-9FF4-EC65A11700A8@oracle.com>
	<6D4CBD60-A855-44A6-9C27-04D2012D050D@oracle.com>
Message-ID: <81FC93A7-03F5-4A10-858A-9D227540D419@oracle.com>


> On Apr 28, 2016, at 12:23 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
> 
>> On Apr 28, 2016, at 6:13 AM, Tom Rodriguez <tom.rodriguez at oracle.com <mailto:tom.rodriguez at oracle.com>> wrote:
>> 
>> http://cr.openjdk.java.net/~never/8155047/webrev <http://cr.openjdk.java.net/~never/8155047/webrev>
> 
>          public LeafType(ResolvedJavaType context) {
> +            assert !context.isLeaf() : "assumption isn't required for leaf types";
> This assert is confusing.  The assumption is that a given type has no subtypes, which is also true for leaf types.  Does this assert make sure it?s not used in the wrong places?

LeafType is an assumption that a dynamic type that might someday have subclasses doesn?t currently have any.  isLeaf() is a static guarantee that a type will never have subclasses, so we are asserting that we never emit a dynamic dependence for something that is statically true.  I?m open to new wording.

tom

> 
>> 
>> findLeafConcreteSubtype should use the same machinery for the elemental type when identifying leaf array types.
>> 
>> tom
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160428/166a2ccc/attachment.html>

From tom.rodriguez at oracle.com  Thu Apr 28 23:57:02 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 28 Apr 2016 16:57:02 -0700
Subject: RFR 8154483: update IGV with improvements from Graal
In-Reply-To: <572294DD.40601@oracle.com>
References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com>
	<572294DD.40601@oracle.com>
Message-ID: <8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com>


> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> In serialization/BinaryParser.java Copyright year change is wrong.

Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch.  I?ll fix it.  Actually, should I go ahead and update the copyright years for all these files?

tom

> 
> Thanks,
> Vladimir
> 
> On 4/28/16 9:26 AM, Tom Rodriguez wrote:
>> http://cr.openjdk.java.net/~never/8154483/webrev
>> 
>> This is a collection of small improvements to IGV developed while working on Graal.  I?ve categorized them below
>> 
>> Bug fixes:
>>   Reset folder in top component to release reference to old graphs
>>   Fix concurrent modification exception in IGV
>>   Fix HTML quoting in tooltips and remove useless entries from search box
>> 
>> UI improvements:
>>   Fixed keybinding for open and save actions in IGV
>>   Allow importing multiple files at once in IGV
>>   Make node searches look through the graph history for a match
>> 
>> Performance related improvements:
>>   Add info message about time spent parsing files
>>   Relax expensive assert in IGV
>>   Reduce overhead of hash computation for graph identity checks
>>   Increase Integer cache size in IGV
>>   Share properties in IGV
>> 
>> Binary graph format:
>>   Add folder and graph property support to binary graphs in IGV
>>   Handle Strings larger than buffer size properly in IGV
>> 


From vladimir.kozlov at oracle.com  Fri Apr 29 00:03:09 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 17:03:09 -0700
Subject: RFR: 8153655: Make intrinsics flags diagnostic and update
	intrinsics tests to enable diagnostic options.
In-Reply-To: <b161dc52-6169-4e55-8ec2-9349d03b172b@default>
References: <b161dc52-6169-4e55-8ec2-9349d03b172b@default>
Message-ID: <5722A4BD.5060607@oracle.com>

Hi Rahul,

Changes looks good but you need to update changes for SHA tests because I changed them for JDK-8154495:

http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6a17c49de974

Thanks,
Vladimir

On 4/27/16 2:45 AM, Rahul Raghavan wrote:
> Hi,
>
> Please review the following patch for JDK-8153655.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153655
> Webrev: http://cr.openjdk.java.net/~rraghavan/8153655/webrev.00/
>
>
> Notes:
>
> 1. This 8153655/webrev.00 re-includes earlier backed out, same JDK-8145348 changes
>         (https://bugs.openjdk.java.net/browse/JDK-8145348 - Make intrinsics flags diagnostic)
> and also additional fixes in failing intrinsic tests.
>
>
> 2. Checked all the usages of changed intrinsic flags in tests and
> found JDK-8153655 type test failure issue (after initial JDK-8145348 fix) is present only for following tests -
>     a. UseAESIntrinsics test (compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java)
>     b. UseSHA* tests (at compiler/intrinsics/sha/cli/)
>
>
> 3. Summary of 8153655/webrev.00 changes.
>
> - Includes earlier backed out, same JDK-8145348 changes:
>         src/share/vm/c1/c1_globals.hpp
>         src/share/vm/opto/c2_globals.hpp
>         src/share/vm/runtime/globals.hpp
>         test/compiler/intrinsics/muladd/TestMulAdd.java
>         test/compiler/runtime/6859338/Test6859338.java
>
> - 'test/compiler/cpuflags/AESIntrinsicsBase.java'
>        Options were passed in wrong order.
>        Changes done so that 'UnlockDiagnosticVMOptions' option precedes the diagnostic flags.
>
> - 'test/compiler/intrinsics/sha/cli/*' - (UseSHA* tests)
>       'UnlockDiagnosticVMOptions' option was not getting passed.
>       Added support to precede intrinsic flag usages with explicit 'UnlockDiagnosticVMOptions'.
>
>
> 4. No issues found with testing done using product builds with proposed changes
> (hotspot/test/compiler/cpuflags/*, hotspot/test/compiler/intrinsics/*, hotspot/test/compiler/runtime/6859338/Test6859338.java)
> Complete pre-integration testing using product builds is in progress.
>
>
> Thanks,
> Rahul
>

From vladimir.kozlov at oracle.com  Fri Apr 29 00:05:02 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 17:05:02 -0700
Subject: RFR 8154483: update IGV with improvements from Graal
In-Reply-To: <8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com>
References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com>
	<572294DD.40601@oracle.com>
	<8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com>
Message-ID: <5722A52E.6030502@oracle.com>

On 4/28/16 4:57 PM, Tom Rodriguez wrote:
>
>> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> Looks good.
>>
>> In serialization/BinaryParser.java Copyright year change is wrong.
>
> Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch.  I?ll fix it.  Actually, should I go ahead and update the copyright years for all these files?

Yes, please.

Thanks,
Vladimir

>
> tom
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/28/16 9:26 AM, Tom Rodriguez wrote:
>>> http://cr.openjdk.java.net/~never/8154483/webrev
>>>
>>> This is a collection of small improvements to IGV developed while working on Graal.  I?ve categorized them below
>>>
>>> Bug fixes:
>>>    Reset folder in top component to release reference to old graphs
>>>    Fix concurrent modification exception in IGV
>>>    Fix HTML quoting in tooltips and remove useless entries from search box
>>>
>>> UI improvements:
>>>    Fixed keybinding for open and save actions in IGV
>>>    Allow importing multiple files at once in IGV
>>>    Make node searches look through the graph history for a match
>>>
>>> Performance related improvements:
>>>    Add info message about time spent parsing files
>>>    Relax expensive assert in IGV
>>>    Reduce overhead of hash computation for graph identity checks
>>>    Increase Integer cache size in IGV
>>>    Share properties in IGV
>>>
>>> Binary graph format:
>>>    Add folder and graph property support to binary graphs in IGV
>>>    Handle Strings larger than buffer size properly in IGV
>>>
>

From tom.rodriguez at oracle.com  Fri Apr 29 00:08:28 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 28 Apr 2016 17:08:28 -0700
Subject: RFR 8154483: update IGV with improvements from Graal
In-Reply-To: <5722A52E.6030502@oracle.com>
References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com>
	<572294DD.40601@oracle.com>
	<8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com>
	<5722A52E.6030502@oracle.com>
Message-ID: <DAB9B4F9-BCE0-4AFE-B593-6A08715BB617@oracle.com>


> On Apr 28, 2016, at 5:05 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> On 4/28/16 4:57 PM, Tom Rodriguez wrote:
>> 
>>> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>> 
>>> Looks good.
>>> 
>>> In serialization/BinaryParser.java Copyright year change is wrong.
>> 
>> Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch.  I?ll fix it.  Actually, should I go ahead and update the copyright years for all these files?
> 
> Yes, please.

They?re all updated.

tom

> 
> Thanks,
> Vladimir
> 
>> 
>> tom
>> 
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 4/28/16 9:26 AM, Tom Rodriguez wrote:
>>>> http://cr.openjdk.java.net/~never/8154483/webrev
>>>> 
>>>> This is a collection of small improvements to IGV developed while working on Graal.  I?ve categorized them below
>>>> 
>>>> Bug fixes:
>>>>   Reset folder in top component to release reference to old graphs
>>>>   Fix concurrent modification exception in IGV
>>>>   Fix HTML quoting in tooltips and remove useless entries from search box
>>>> 
>>>> UI improvements:
>>>>   Fixed keybinding for open and save actions in IGV
>>>>   Allow importing multiple files at once in IGV
>>>>   Make node searches look through the graph history for a match
>>>> 
>>>> Performance related improvements:
>>>>   Add info message about time spent parsing files
>>>>   Relax expensive assert in IGV
>>>>   Reduce overhead of hash computation for graph identity checks
>>>>   Increase Integer cache size in IGV
>>>>   Share properties in IGV
>>>> 
>>>> Binary graph format:
>>>>   Add folder and graph property support to binary graphs in IGV
>>>>   Handle Strings larger than buffer size properly in IGV
>>>> 
>> 


From vladimir.kozlov at oracle.com  Fri Apr 29 00:14:17 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 28 Apr 2016 17:14:17 -0700
Subject: RFR 8154483: update IGV with improvements from Graal
In-Reply-To: <DAB9B4F9-BCE0-4AFE-B593-6A08715BB617@oracle.com>
References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com>
	<572294DD.40601@oracle.com>
	<8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com>
	<5722A52E.6030502@oracle.com>
	<DAB9B4F9-BCE0-4AFE-B593-6A08715BB617@oracle.com>
Message-ID: <5722A759.80706@oracle.com>

Wow, did we really got IGV at 1998?

Looks good.

Thanks,
Vladimir

On 4/28/16 5:08 PM, Tom Rodriguez wrote:
>
>> On Apr 28, 2016, at 5:05 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> On 4/28/16 4:57 PM, Tom Rodriguez wrote:
>>>
>>>> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>> Looks good.
>>>>
>>>> In serialization/BinaryParser.java Copyright year change is wrong.
>>>
>>> Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch.  I?ll fix it.  Actually, should I go ahead and update the copyright years for all these files?
>>
>> Yes, please.
>
> They?re all updated.
>
> tom
>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> tom
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/28/16 9:26 AM, Tom Rodriguez wrote:
>>>>> http://cr.openjdk.java.net/~never/8154483/webrev
>>>>>
>>>>> This is a collection of small improvements to IGV developed while working on Graal.  I?ve categorized them below
>>>>>
>>>>> Bug fixes:
>>>>>    Reset folder in top component to release reference to old graphs
>>>>>    Fix concurrent modification exception in IGV
>>>>>    Fix HTML quoting in tooltips and remove useless entries from search box
>>>>>
>>>>> UI improvements:
>>>>>    Fixed keybinding for open and save actions in IGV
>>>>>    Allow importing multiple files at once in IGV
>>>>>    Make node searches look through the graph history for a match
>>>>>
>>>>> Performance related improvements:
>>>>>    Add info message about time spent parsing files
>>>>>    Relax expensive assert in IGV
>>>>>    Reduce overhead of hash computation for graph identity checks
>>>>>    Increase Integer cache size in IGV
>>>>>    Share properties in IGV
>>>>>
>>>>> Binary graph format:
>>>>>    Add folder and graph property support to binary graphs in IGV
>>>>>    Handle Strings larger than buffer size properly in IGV
>>>>>
>>>
>

From tom.rodriguez at oracle.com  Fri Apr 29 04:17:54 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 28 Apr 2016 21:17:54 -0700
Subject: RFR 8154483: update IGV with improvements from Graal
In-Reply-To: <5722A759.80706@oracle.com>
References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com>
	<572294DD.40601@oracle.com>
	<8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com>
	<5722A52E.6030502@oracle.com>
	<DAB9B4F9-BCE0-4AFE-B593-6A08715BB617@oracle.com>
	<5722A759.80706@oracle.com>
Message-ID: <5FD6EA5C-1144-46A7-B477-40B2BF3F4985@oracle.com>


> On Apr 28, 2016, at 5:14 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Wow, did we really got IGV at 1998?

I think it contains some elements that are that old but IGV was integrated into the hotspot repo in 2008.  It existed for a few years before that as Thomas?s thesis project, so maybe 2006?

tom

> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 4/28/16 5:08 PM, Tom Rodriguez wrote:
>> 
>>> On Apr 28, 2016, at 5:05 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>> 
>>> On 4/28/16 4:57 PM, Tom Rodriguez wrote:
>>>> 
>>>>> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>>> 
>>>>> Looks good.
>>>>> 
>>>>> In serialization/BinaryParser.java Copyright year change is wrong.
>>>> 
>>>> Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch.  I?ll fix it.  Actually, should I go ahead and update the copyright years for all these files?
>>> 
>>> Yes, please.
>> 
>> They?re all updated.
>> 
>> tom
>> 
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>>> 
>>>> tom
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>> On 4/28/16 9:26 AM, Tom Rodriguez wrote:
>>>>>> http://cr.openjdk.java.net/~never/8154483/webrev
>>>>>> 
>>>>>> This is a collection of small improvements to IGV developed while working on Graal.  I?ve categorized them below
>>>>>> 
>>>>>> Bug fixes:
>>>>>>   Reset folder in top component to release reference to old graphs
>>>>>>   Fix concurrent modification exception in IGV
>>>>>>   Fix HTML quoting in tooltips and remove useless entries from search box
>>>>>> 
>>>>>> UI improvements:
>>>>>>   Fixed keybinding for open and save actions in IGV
>>>>>>   Allow importing multiple files at once in IGV
>>>>>>   Make node searches look through the graph history for a match
>>>>>> 
>>>>>> Performance related improvements:
>>>>>>   Add info message about time spent parsing files
>>>>>>   Relax expensive assert in IGV
>>>>>>   Reduce overhead of hash computation for graph identity checks
>>>>>>   Increase Integer cache size in IGV
>>>>>>   Share properties in IGV
>>>>>> 
>>>>>> Binary graph format:
>>>>>>   Add folder and graph property support to binary graphs in IGV
>>>>>>   Handle Strings larger than buffer size properly in IGV
>>>>>> 
>>>> 
>> 


From zoltan.majo at oracle.com  Fri Apr 29 06:37:35 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Fri, 29 Apr 2016 08:37:35 +0200
Subject: [9] RFR (XS): 8153340: Incorrect lower bound for
	AllocatePrefetchDistance with AllocatePrefetchStyle=3
In-Reply-To: <57229CD6.9050003@oracle.com>
References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com>
	<571F5756.7020007@oracle.com>
	<0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com>
	<5720CA98.10605@oracle.com>
	<8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com>
	<5721DCE7.1020102@oracle.com> <57229CD6.9050003@oracle.com>
Message-ID: <5723012F.2020003@oracle.com>

Hi Vladimir,


On 04/29/2016 01:29 AM, Vladimir Kozlov wrote:
> [...]
>> JEP-245 proposed to have ranges/constraints defined for all flags. 
>> Some of these ranges/constraints are unavoidably architecture-specific.
>>
>> I agree with you that having lots of #ifdefs in shared code does not 
>> improve code readability. But I'm wondering what would be a good way 
>> to achieve the goals of JEP-245 without using too many
>> #ifdefs. Maybe architecture-specific constraint files (e.g., 
>> commandLineFlagConstraintsCompiler_solaris.cpp, 
>> commandLineFlagConstraintsCompiler_x86.cpp, etc.)?
>
> Yes, that will work.

OK, I'll keep that in mind when addressing similar problems in the future.

> [...]
>>
>> I think using BIS with AllocatePrefetchStyle=2 is indeed the cause.
>>
>> I've executed the following on our T7 machine with b115:
>> * java -XX:AllocatePrefetchStyle=2 (uses AllocatePrefetchInstr=1 by 
>> default) -> crashes
>> * java -XX:AllocatePrefetchStyle=2 -XX:AllocatePrefetchInstr=0 -> works
>>
>> With the newest webrev, both commands pass.
>>
>> Here is the newest webrev:
>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.03/
>
> This looks good.

Thank you for the review!

Best regards,


Zoltan

>
> Thanks,
> Vladimir
>
>>
>> I re-did testing with JPRT, the results look OK.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, 
>>>> I've started an RBT run with all hotspot on all
>>>> platforms using AllocatePrefetchStyle=2. So far no problems have 
>>>> shown up.
>>>>
>>>>> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch 
>>>>> should be set for all styles on all platforms to
>>>>> avoid accessing beyond heap. Prefetch instructions doc say that 
>>>>> they does not trap but we should be careful.
>>>>
>>>> I agree.
>>>>
>>>> That means we initialize _reserve_for_allocation_prefetch in a 
>>>> platform-independent way. So I think it would make sense
>>>> to move that field to ThreadLocalAllocBuffer, as TLAB is the only 
>>>> user of that field and we don't support
>>>> platform-independent initialization of Abstract_VM_Version. I did 
>>>> that in the updated webrev.
>>>>
>>>>>>
>>>>>> The updated webrev does the following (in addition to fixing the 
>>>>>> original problem with AllocatePrefetchDistance):
>>>>>>
>>>>>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 
>>>>>> (i.e., BIS instructions are used for prefetching). As
>>>>>> far as I understand, AllocatePrefetchStyle = 3 was added to 
>>>>>> support prefetching with BIS, so if BIS is enabled, we
>>>>>> should use AllocatePrefetchStyle = 3.
>>>>>
>>>>> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should 
>>>>> select AllocatePrefetchStyle = 3.
>>>>
>>>> OK.
>>>>
>>>>> But we should allow AllocatePrefetchStyle = 3  if normal prefetch 
>>>>> instructions (or other platforms) are used.
>>>>
>>>> OK, I've removed that restriction.
>>>>
>>>>> I think we should update comment in globals.hpp to say "generate 
>>>>> one prefetch per cache line" without saying BIS.
>>>>
>>>> OK.
>>>>
>>>>>
>>>>> But I agree if BIS is not available we should not use BIS 
>>>>> AllocatePrefetchInstr = 1.
>>>>
>>>> OK.
>>>>
>>>>>
>>>>>>
>>>>>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on 
>>>>>> non-SPARC platforms.
>>>>>
>>>>> It could be useful on other platforms since it does one access per 
>>>>> cache line.
>>>>
>>>> OK, let's keep it available then.
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if 
>>>>>> AllocatePrefetchStyle is 3 (due to alignment requirements).
>>>>>
>>>>> That is correct since stxa requires at least 8 bytes alignment (as 
>>>>> stx).
>>>>
>>>> OK.
>>>>
>>>>>
>>>>>>
>>>>>> 3. Determine the number of lines to prefetch in the same way for 
>>>>>> all prefetch styles:
>>>>>> lines = (prefecth instance allocation) ? 
>>>>>> AllocateInstancePrefetchLines : AllocatePrefetchLines
>>>>>
>>>>> Agree.
>>>>
>>>> OK.
>>>>
>>>>>
>>>>>>
>>>>>> Here is the updated webrev:
>>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/
>>>>>
>>>>> vm_version_sparc.cpp
>>>>> AllocatePrefetchInstr = 0 should be set for all styles (not only 
>>>>> 1) when BIS is not available.
>>>>
>>>> OK. I think it is sufficient to do
>>>>
>>>> 52 if (!has_blk_init()) {
>>>> 53 if (AllocatePrefetchInstr == 1) {
>>>> 54 warning("BIS instructions required for AllocatePrefetchInstr 1 
>>>> unavailable");
>>>> 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0);
>>>> 56 } I hope I'm not missing anything here. Here is the updated webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - 
>>>> JPRT (incl. TestOptionsWithRanges); - local testing on
>>>> SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all 
>>>> platforms. Thank you! Best regards, Zoltan
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Testing:
>>>>>> - JPRT (incl. TestOptionsWithRanges)
>>>>>> - local testing on a SPARC machine.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> please review the patch for 8153340.
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153340
>>>>>>>>
>>>>>>>>
>>>>>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and 
>>>>>>>> AllocatePrefetchDistance==0. The crash happens due to the way
>>>>>>>> the address for the first prefetch instruction is calculated [1]:
>>>>>>>>
>>>>>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= 
>>>>>>>> ~(AllocatePrefetchStepSize - 1) which can zero some of
>>>>>>>> the bits of cache_adr. That result in accesses *before* the 
>>>>>>>> newly allocated object.
>>>>>>>>
>>>>>>>>
>>>>>>>> Solution: Set lower limit of AllocatePrefetchDistance to 
>>>>>>>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3).
>>>>>>>> Unquarantine test.
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> - JPRT (incl. TestOptionsWithRanges.java)
>>>>>>>> - local testing on a SPARC machine.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>> [1] 
>>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941
>>>>>>
>>>>
>>


From rwestrel at redhat.com  Fri Apr 29 08:49:34 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 29 Apr 2016 10:49:34 +0200
Subject: RFR(XS): 8155717: Aarch64: enable loop superword's unrolling analysis
Message-ID: <5723201E.5080909@redhat.com>

http://cr.openjdk.java.net/~roland/8155717/webrev.00/

Loop unrolling analysis can help on arm for smaller data types (bytes).
For other data types, it drives more inlining which can also be
beneficial. This also includes the change Michael ok'ed to shared code.

Roland.

From aph at redhat.com  Fri Apr 29 08:53:11 2016
From: aph at redhat.com (Andrew Haley)
Date: Fri, 29 Apr 2016 09:53:11 +0100
Subject: RFR(XS): 8155717: Aarch64: enable loop superword's unrolling
	analysis
In-Reply-To: <5723201E.5080909@redhat.com>
References: <5723201E.5080909@redhat.com>
Message-ID: <572320F7.9030608@redhat.com>

On 29/04/16 09:49, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8155717/webrev.00/
> 
> Loop unrolling analysis can help on arm for smaller data types (bytes).
> For other data types, it drives more inlining which can also be
> beneficial. This also includes the change Michael ok'ed to shared code.

OK, cool.  Will Michael sponsor this, then?

Thanks,

Andrew.


From rwestrel at redhat.com  Fri Apr 29 08:56:15 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 29 Apr 2016 10:56:15 +0200
Subject: RFR(S): 8154943: AArch64: redundant address computation
	instructions with vectorization
In-Reply-To: <b61e768f-03d8-d3cc-de90-92a50e3cabf2@oracle.com>
References: <dk64malofn1.fsf@rwestrel.remote.csb>
	<b61e768f-03d8-d3cc-de90-92a50e3cabf2@oracle.com>
Message-ID: <572321AF.3000703@redhat.com>

> Are there any platforms where this won't help?

Yes. x86 needs it:

operand indPosIndexScale(any_RegP reg, rRegI idx, immI2 scale)
%{
  constraint(ALLOC_IN_RC(ptr_reg));
  predicate(n->in(3)->in(1)->as_Type()->type()->is_long()->_lo >= 0);
  match(AddP reg (LShiftL (ConvI2L idx) scale));

>From a quick look through ppc.ad, it seems it needs it to.

> How about adding a Matcher function instead of a cpu-specific ifdef.

Sounds like a good suggestion.

Roland.

From rwestrel at redhat.com  Fri Apr 29 08:58:28 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 29 Apr 2016 10:58:28 +0200
Subject: RFR(XS): 8155717: Aarch64: enable loop superword's unrolling
	analysis
In-Reply-To: <572320F7.9030608@redhat.com>
References: <5723201E.5080909@redhat.com> <572320F7.9030608@redhat.com>
Message-ID: <57232234.7020504@redhat.com>


> OK, cool.  Will Michael sponsor this, then?

Thanks for reviewing it. I don't think Michael has that power.

Roland.

From nils.eliasson at oracle.com  Fri Apr 29 12:31:27 2016
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 29 Apr 2016 14:31:27 +0200
Subject: [9] RFR(S): 8142464: PlatformLoggerTest.java throws
	java.lang.RuntimeException: Logger test.logger.bar does not exist
In-Reply-To: <5722A2B8.7000100@oracle.com>
References: <5721FB8F.40207@oracle.com> <5722A2B8.7000100@oracle.com>
Message-ID: <5723541F.2060708@oracle.com>

Thank you Vladimir!

//Nils

On 2016-04-29 01:54, Vladimir Kozlov wrote:
> Good.
>
> thanks,
> Vladimir
>
> On 4/28/16 5:01 AM, Nils Eliasson wrote:
>> Hi,
>>
>> Please review the fix of 
>> jdk/test/sun/util/logging/PlatformLoggerTest.java that has been 
>> failing intermittently in our nightlies.
>>
>> Summary:
>> The test uses loggers that are accessible by weak refs from a 
>> LogManager. Since the test doesn't keep a strong ref to the loggers 
>> they may get collected during the test.
>>
>> Solution:
>> Save loggers to a static field  in the test class.
>>
>> Other:
>> Also removing "@compile -XDignore.symbol.file" that is unnecessary 
>> after Jigsaw. The compile tag will force a compile even if the class 
>> already exists, wasting times during reruns.
>>
>> Testing:
>> Running tests on all platforms.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8142464
>> webrev: http://cr.openjdk.java.net/~neliasso/8142464/webrev_jdk.01/ 
>> (JDK)
>>
>> Regards,
>> Nils Eliasson


From zoltan.majo at oracle.com  Fri Apr 29 13:37:00 2016
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Fri, 29 Apr 2016 15:37:00 +0200
Subject: [9] RFR(XS): 8155653: TestVectorUnalignedOffset.java not pushed
	with 8155612
In-Reply-To: <57226126.8010708@oracle.com>
References: <57224C10.9060704@oracle.com> <57226126.8010708@oracle.com>
Message-ID: <5723637C.4000602@oracle.com>

Thank you, Vladimir!

Best regards,


Zoltan

On 04/28/2016 09:14 PM, Vladimir Kozlov wrote:
> Good.
>
> Thanks,
> Vladimir
>
> On 4/28/16 10:44 AM, Zolt?n Maj? wrote:
>> Hi,
>>
>>
>> The test TestVectorUnalignedOffset.java was originally included (and 
>> reviewed) with JDK-8155612:
>> http://cr.openjdk.java.net/~roland/8155612/webrev.00/
>>
>> I sponsored the changeset but, by accident, did not push the test. So 
>> I've filed the current bug to address that problem.
>>
>> If there are no objections, I'll push the test separately tomorrow 
>> (with Roland as a contributor). Here is the webrev:
>> http://cr.openjdk.java.net/~zmajo/8155653/webrev.00/
>>
>> Sorry for the noise.
>>
>> Best regards,
>>
>>
>> Zoltan
>>


From roland.schatz at oracle.com  Fri Apr 29 13:45:32 2016
From: roland.schatz at oracle.com (Roland Schatz)
Date: Fri, 29 Apr 2016 15:45:32 +0200
Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI exception
	stubs
Message-ID: <5723657C.6020804@oracle.com>

Hi,

Please review this small change:

webrev: http://cr.openjdk.java.net/~rschatz/JDK-8155735/webrev.00/
bug: https://bugs.openjdk.java.net/browse/JDK-8155735

This makes it possible to use the exception stubs without knowing about 
the hotspot-only concept of symbol pointers.

Thanks,
Roland

From volker.simonis at gmail.com  Fri Apr 29 13:53:20 2016
From: volker.simonis at gmail.com (Volker Simonis)
Date: Fri, 29 Apr 2016 15:53:20 +0200
Subject: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on
	PowerPC
Message-ID: <CA+3eh10NB4L5X-hU0CwZpbb80dWpy9yZVzEtzcYtUs+gz3YJ2g@mail.gmail.com>

Hi Aleksey,

while implementing the new intrinsics for jdk.internal.misc.Unsafe on
PowerPC we encountered problems with the following weakCompareAndSwap
tests:

compiler/unsafe/JdkInternalMiscUnsafeAccessTestInt.java
compiler/unsafe/JdkInternalMiscUnsafeAccessTestLong.java

The problem is that a weakComapreAndSwap may spuriously fail, but the
tests assume that it will always succeed. On Power we saw for example
the following kind of failures:

test JdkInternalMiscUnsafeAccessTestInt.testArray(): failure
java.lang.AssertionError: weakCompareAndSwapRelease int expected
[true] but found [false]
        at org.testng.Assert.fail(Assert.java:94)
        at org.testng.Assert.failNotEquals(Assert.java:494)
        at org.testng.Assert.assertEquals(Assert.java:123)
        at org.testng.Assert.assertEquals(Assert.java:286)
        at JdkInternalMiscUnsafeAccessTestInt.testAccess(JdkInternalMiscUnsafeAccessTestInt.java:269)
        at JdkInternalMiscUnsafeAccessTestInt.testArray(JdkInternalMiscUnsafeAccessTestInt.java:109)

>From our understanding of the weakComapreAndSwap semantics it is
however perfectly legal that the weakComapreAndSwap fails and the old
values remain in place.

We would propose to run the weakComapreAndSwap in a loop in the test
until it succeeds and only check for the correct value in memory
afterwards. We could also log the number of retry attempts until the
weakComapreAndSwap succeeds into the .jtr file.

As far as we saw, all the regression tests are single threaded. Are
there any multi-threaded tests available which stress the new Unsafe
intrinsics (i.e. in the way this was previously done by the Java
Concurrency Torture tests which verified that only valid combinations
of reads were observed).

Regards,
Martin & Volker

From aleksey.shipilev at oracle.com  Fri Apr 29 14:04:33 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 29 Apr 2016 17:04:33 +0300
Subject: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on
	PowerPC
In-Reply-To: <CA+3eh10NB4L5X-hU0CwZpbb80dWpy9yZVzEtzcYtUs+gz3YJ2g@mail.gmail.com>
References: <CA+3eh10NB4L5X-hU0CwZpbb80dWpy9yZVzEtzcYtUs+gz3YJ2g@mail.gmail.com>
Message-ID: <572369F1.3010306@oracle.com>

Hi Volker,

On 04/29/2016 04:53 PM, Volker Simonis wrote:
> The problem is that a weakComapreAndSwap may spuriously fail, but the
> tests assume that it will always succeed. On Power we saw for example
> the following kind of failures:
> 
> test JdkInternalMiscUnsafeAccessTestInt.testArray(): failure
> java.lang.AssertionError: weakCompareAndSwapRelease int expected
> [true] but found [false]
>         at org.testng.Assert.fail(Assert.java:94)
>         at org.testng.Assert.failNotEquals(Assert.java:494)
>         at org.testng.Assert.assertEquals(Assert.java:123)
>         at org.testng.Assert.assertEquals(Assert.java:286)
>         at JdkInternalMiscUnsafeAccessTestInt.testAccess(JdkInternalMiscUnsafeAccessTestInt.java:269)
>         at JdkInternalMiscUnsafeAccessTestInt.testArray(JdkInternalMiscUnsafeAccessTestInt.java:109)
> 
> From our understanding of the weakComapreAndSwap semantics it is
> however perfectly legal that the weakComapreAndSwap fails and the old
> values remain in place.

Ah, oops, indeed! The test is too strong, and it just happens to work on
x86. AArch64 should fail the same, once intrinsics are there.
jdk/test/java/lang/invoke/VarHandles should fail too, so it deserves a
global testbug fix.

Can you prototype and test a simple weakCAS/CAE test on the POWER? We
can then propagate the test shape to other tests with:
  https://bugs.openjdk.java.net/browse/JDK-8155739


> We would propose to run the weakComapreAndSwap in a loop in the test
> until it succeeds and only check for the correct value in memory
> afterwards. We could also log the number of retry attempts until the
> weakComapreAndSwap succeeds into the .jtr file.

Yes, looping is better for the tests.


> As far as we saw, all the regression tests are single threaded. Are
> there any multi-threaded tests available which stress the new Unsafe
> intrinsics (i.e. in the way this was previously done by the Java
> Concurrency Torture tests which verified that only valid combinations
> of reads were observed).

Yes, we have the jcstress tests pending for VarHandles, but they are not
yet committed to jcstress.

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/b4541bc4/signature.asc>

From rwestrel at redhat.com  Fri Apr 29 14:05:26 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 29 Apr 2016 16:05:26 +0200
Subject: [aarch64-port-dev ] RFR: 8155617: aarch64: ClearArray does not
	use	DC ZVA
In-Reply-To: <1461851388.10531.27.camel@mint>
References: <1461851388.10531.27.camel@mint>
Message-ID: <dk6a8kcecih.fsf@rwestrel.remote.csb>


Hi Ed,

I can't start a debug VM since that change was pushed:

#  Internal Error (/scratch/rwestrel/hs-comp/hotspot/src/share/vm/runtime/stubRoutines.cpp:344), pid=16911, tid=16912
#  assert(s.body[i] == 32) failed: what?

I think the problem is MacroAssembler::fill_words() moves the pointer
past the last word that was written (i.e. there'a hole between the last
written word and the value of the base). It passes again with this
patch:

diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
--- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
+++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
@@ -4754,12 +4754,12 @@
   const int unroll = 8; // Number of stp instructions we'll unroll
 
   cbz(cnt, fini);
-  tbz(base, 3, skip);
+  tbz(cnt, 0, skip);
   str(value, Address(post(base, 8)));
   sub(cnt, cnt, 1);
   bind(skip);
 
-  andr(rscratch1, cnt, (unroll-1) * 2);
+  andr(rscratch1, cnt, unroll* 2 - 1);
   sub(cnt, cnt, rscratch1);
   add(base, base, rscratch1, Assembler::LSL, 3);
   adr(rscratch2, entry);
@@ -4767,15 +4767,14 @@
   br(rscratch2);
 
   bind(loop);
-  for (int i = -unroll; i < 0; i++)
+  add(base, base, unroll * 16);
+  sub(cnt, cnt, unroll * 2);
+  for (int i = -unroll; i < 0; i++) {
     stp(value, value, Address(base, i * 16));
+  }
   bind(entry);
-  subs(cnt, cnt, unroll * 2);
-  add(base, base, unroll * 16);
-  br(Assembler::GE, loop);
-
-  tbz(cnt, 0, fini);
-  str(value, Address(base, -unroll * 16));
+  cbnz(cnt, loop);
+
   bind(fini);
 }
 

I've done very little testing with it but a few things felt strange in
that code:

- I don't understand why aligning the base at the beginning is
  necessary. Once the base is aligned there's no guarantee the cnt is an
  even number of words. so when we update the base with:

add(base, base, rscratch1, Assembler::LSL, 3);

base can be misaligned again. I changed it to test whether cnt is even
so at least we don't have to worry about that anymore.

- (unroll-1) * 2 seems like an incorrect mask.

- then there's the problem of base being updated at the end of the loop
  that I mentioned above and that can lead to an incorrect base.

- That part at the end:
 str(value, Address(base, -unroll * 16));

I don't understand.

Anyway, as I said I've done very little testing so it's entirely
possible my patch above is broken.

Roland.
 

From martin.doerr at sap.com  Fri Apr 29 14:06:19 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 29 Apr 2016 14:06:19 +0000
Subject: RFR(S): 8155729: C2: Skip transformation of LoadConP for heap-based
	compressed oops
Message-ID: <41d17a7145394a0aa7e1ebcd890ccd73@DEWDFE13DE14.global.corp.sap>

Hi,

I have opened a new bug for the proposal which was discussed in thread 8154826.
The summary is:

C2's final graph reshaping performs the following transformation:
Original pattern: LoadConP + Storage access
Transformed pattern: LoadConN + DecodeN heap-based + Storage access

This seems to be fine for simpler compressed oops mode. It also seems to be fine on x86 which can match the decoding into the operand of the Storage access instruction.

Other platforms should better skip the transformation:
-PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded.
-SPARC can use that as well.
-aarch64: LoadConN+DecodeN has a higher latency than LoadConP.

We can always skip the transformation in heap-based compressed klass mode. Matching DecodeNKlass as operand is currently not implemented.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8155729_LoadConP/webrev.00/

Please review. I will also need a sponsor, please.

Best regards,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/dbee2d4c/attachment-0001.html>

From goetz.lindenmaier at sap.com  Fri Apr 29 14:09:38 2016
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 29 Apr 2016 14:09:38 +0000
Subject: RFR(XS): 8155738: C2: fix frame_complete_offset
Message-ID: <2768884fbe2647e3a2e6b82da36b4fe6@DEWDFE13DE09.global.corp.sap>

Hi,

In C2, frame_complete_offset is set wrong.  It's also called during scratch_emit_size,
which happens at least in the debug build.
I also added the missing call on ppc.

Please review this small fix.  I please need a sponsor:
http://cr.openjdk.java.net/~goetz/wr16/8155738-frameCmplt/webrev.00/

Thanks and best regards,
  Goetz.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/f353ae13/attachment.html>

From martin.doerr at sap.com  Fri Apr 29 14:46:18 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 29 Apr 2016 14:46:18 +0000
Subject: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on
	PowerPC
In-Reply-To: <572369F1.3010306@oracle.com>
References: <CA+3eh10NB4L5X-hU0CwZpbb80dWpy9yZVzEtzcYtUs+gz3YJ2g@mail.gmail.com>
	<572369F1.3010306@oracle.com>
Message-ID: <35a9d588d5eb41e5ae017938de789f3c@DEWDFE13DE14.global.corp.sap>

Hi Aleksey,

I have used the following test block on ppc64le. More than 2 attempts in the single-threaded test were never observed.
So it would be interesting to report it somehow if that happens.

        {
            int attempts = 0;
            boolean r;
            do {
              attempts++;
              r = UNSAFE.weakCompareAndSwapInt(base, offset, 1, 2);
            } while (!r && attempts < 10);
            if (attempts > 2) {
              reportMessage(attempts, "weakCompareAndSwap int 1 -> 2");
            }
            int x = UNSAFE.getInt(base, offset);
            assertEquals(x, 2, "weakCompareAndSwap int value 1 -> 2");
        }

Is this what you mean by simple weakCAS test prototype?

Best regards,
Martin


-----Original Message-----
From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] 
Sent: Freitag, 29. April 2016 16:05
To: Volker Simonis <volker.simonis at gmail.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: Doerr, Martin <martin.doerr at sap.com>
Subject: Re: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on PowerPC

* PGP Signature not checked

Hi Volker,

On 04/29/2016 04:53 PM, Volker Simonis wrote:
> The problem is that a weakComapreAndSwap may spuriously fail, but the
> tests assume that it will always succeed. On Power we saw for example
> the following kind of failures:
> 
> test JdkInternalMiscUnsafeAccessTestInt.testArray(): failure
> java.lang.AssertionError: weakCompareAndSwapRelease int expected
> [true] but found [false]
>         at org.testng.Assert.fail(Assert.java:94)
>         at org.testng.Assert.failNotEquals(Assert.java:494)
>         at org.testng.Assert.assertEquals(Assert.java:123)
>         at org.testng.Assert.assertEquals(Assert.java:286)
>         at JdkInternalMiscUnsafeAccessTestInt.testAccess(JdkInternalMiscUnsafeAccessTestInt.java:269)
>         at JdkInternalMiscUnsafeAccessTestInt.testArray(JdkInternalMiscUnsafeAccessTestInt.java:109)
> 
> From our understanding of the weakComapreAndSwap semantics it is
> however perfectly legal that the weakComapreAndSwap fails and the old
> values remain in place.

Ah, oops, indeed! The test is too strong, and it just happens to work on
x86. AArch64 should fail the same, once intrinsics are there.
jdk/test/java/lang/invoke/VarHandles should fail too, so it deserves a
global testbug fix.

Can you prototype and test a simple weakCAS/CAE test on the POWER? We
can then propagate the test shape to other tests with:
  https://bugs.openjdk.java.net/browse/JDK-8155739


> We would propose to run the weakComapreAndSwap in a loop in the test
> until it succeeds and only check for the correct value in memory
> afterwards. We could also log the number of retry attempts until the
> weakComapreAndSwap succeeds into the .jtr file.

Yes, looping is better for the tests.


> As far as we saw, all the regression tests are single threaded. Are
> there any multi-threaded tests available which stress the new Unsafe
> intrinsics (i.e. in the way this was previously done by the Java
> Concurrency Torture tests which verified that only valid combinations
> of reads were observed).

Yes, we have the jcstress tests pending for VarHandles, but they are not
yet committed to jcstress.

Thanks,
-Aleksey


* Signature checking is off by policy

From aph at redhat.com  Fri Apr 29 14:49:00 2016
From: aph at redhat.com (Andrew Haley)
Date: Fri, 29 Apr 2016 15:49:00 +0100
Subject: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on
	PowerPC
In-Reply-To: <35a9d588d5eb41e5ae017938de789f3c@DEWDFE13DE14.global.corp.sap>
References: <CA+3eh10NB4L5X-hU0CwZpbb80dWpy9yZVzEtzcYtUs+gz3YJ2g@mail.gmail.com>
	<572369F1.3010306@oracle.com>
	<35a9d588d5eb41e5ae017938de789f3c@DEWDFE13DE14.global.corp.sap>
Message-ID: <5723745C.9000009@redhat.com>

On 04/29/2016 03:46 PM, Doerr, Martin wrote:

> I have used the following test block on ppc64le. More than 2
> attempts in the single-threaded test were never observed.  So it
> would be interesting to report it somehow if that happens.

That would be the sort of thing that jcstress reports as INTERESTING.
But it's not a failure.

Andrew.

From edward.nevill at gmail.com  Fri Apr 29 15:30:33 2016
From: edward.nevill at gmail.com (Edward Nevill)
Date: Fri, 29 Apr 2016 16:30:33 +0100
Subject: [aarch64-port-dev ] RFR: 8155617: aarch64: ClearArray does not
	use	DC ZVA
In-Reply-To: <dk6a8kcecih.fsf@rwestrel.remote.csb>
References: <1461851388.10531.27.camel@mint>
	<dk6a8kcecih.fsf@rwestrel.remote.csb>
Message-ID: <1461943833.12456.16.camel@mint>

Hi Roland,


On Fri, 2016-04-29 at 16:05 +0200, Roland Westrelin wrote:
> Hi Ed,
> 
> I can't start a debug VM since that change was pushed:
> 
> #  Internal Error (/scratch/rwestrel/hs-comp/hotspot/src/share/vm/runtime/stubRoutines.cpp:344), pid=16911, tid=16912
> #  assert(s.body[i] == 32) failed: what?
> 
> I think the problem is MacroAssembler::fill_words() moves the pointer
> past the last word that was written (i.e. there'a hole between the last
> written word and the value of the base). It passes again with this
> patch:

Sorry, I missed the post-condition that base should point to the end.

Could you please try the following patch

diff -r e118111d4433 src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
--- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp	Fri Apr 29 14:32:19 2016 +0200
+++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp	Fri Apr 29 08:08:47 2016 -0700
@@ -4767,15 +4767,15 @@
   br(rscratch2);
 
   bind(loop);
+  add(base, base, unroll * 16);
   for (int i = -unroll; i < 0; i++)
     stp(value, value, Address(base, i * 16));
   bind(entry);
   subs(cnt, cnt, unroll * 2);
-  add(base, base, unroll * 16);
   br(Assembler::GE, loop);
 
   tbz(cnt, 0, fini);
-  str(value, Address(base, -unroll * 16));
+  str(value, Address(post(base, 8)));
   bind(fini);
 }
 

I believe the only thing wrong with the code is that it exits without the base being updated as expected.

> 
> diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
> --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
> +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
> @@ -4754,12 +4754,12 @@
>    const int unroll = 8; // Number of stp instructions we'll unroll
>  
>    cbz(cnt, fini);
> -  tbz(base, 3, skip);
> +  tbz(cnt, 0, skip);
>    str(value, Address(post(base, 8)));
>    sub(cnt, cnt, 1);
>    bind(skip);

The intention here is to align the base, not ensure the count is even. An odd count is handled by the final store.

>  
> -  andr(rscratch1, cnt, (unroll-1) * 2);
> +  andr(rscratch1, cnt, unroll* 2 - 1);

The mask (unroll-1) * 2 ensures rscratch1 is even. With an unroll value of 8 the mask will be 0xe giving a count of the even no of values to be stored by the jump into the loop.


>    sub(cnt, cnt, rscratch1);
>    add(base, base, rscratch1, Assembler::LSL, 3);
>    adr(rscratch2, entry);
> @@ -4767,15 +4767,14 @@
>    br(rscratch2);
>  
>    bind(loop);
> -  for (int i = -unroll; i < 0; i++)
> +  add(base, base, unroll * 16);
> +  sub(cnt, cnt, unroll * 2);
> +  for (int i = -unroll; i < 0; i++) {
>      stp(value, value, Address(base, i * 16));
> +  }
>    bind(entry);
> -  subs(cnt, cnt, unroll * 2);
> -  add(base, base, unroll * 16);
> -  br(Assembler::GE, loop);

I need to do subs / b GE here because cnt may be odd. Therefore I cannot use cbnz cnt, ... because the count may never become zero.

Moving the update of the base to the head of the loop is correct as that ensures the base points correctly to the end on exit.

> -
> -  tbz(cnt, 0, fini);
> -  str(value, Address(base, -unroll * 16));
> +  cbnz(cnt, loop);
> +
>    bind(fini);
>  }
>  
> 
> I've done very little testing with it but a few things felt strange in
> that code:
> 
> - I don't understand why aligning the base at the beginning is
>   necessary. Once the base is aligned there's no guarantee the cnt is an
>   even number of words. so when we update the base with:
> 
> add(base, base, rscratch1, Assembler::LSL, 3);

Yes, but the mask (unroll-1) * 2 ensures rscratch1 is even.

> 
> base can be misaligned again. I changed it to test whether cnt is even
> so at least we don't have to worry about that anymore.
> 
> - (unroll-1) * 2 seems like an incorrect mask.
> 
> - then there's the problem of base being updated at the end of the loop
>   that I mentioned above and that can lead to an incorrect base.

Yes, sorry I completely missed that postcondition.

> 
> - That part at the end:
>  str(value, Address(base, -unroll * 16));

Because I did the update of the base at the end, base points unroll * 16 beyond the end. So if there is an odd word at the end it is stored at offset -unroll * 16.

But updating the base at the head of the loop ensures it is pointing to the end on exit. Then if there is an odd word at the end we can use post(base, 8) to store the odd word and ensure the base is updated.

> 
> I don't understand.
> 
> Anyway, as I said I've done very little testing so it's entirely
> possible my patch above is broken.

Thanks for finding this and sorry again. I'll give the above patch a good testing and submit it for RFR. If you could also try it that would be appreciated.

All the best,
Ed.


From tom.rodriguez at oracle.com  Fri Apr 29 15:27:39 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 29 Apr 2016 08:27:39 -0700
Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI
	exception stubs
In-Reply-To: <5723657C.6020804@oracle.com>
References: <5723657C.6020804@oracle.com>
Message-ID: <B07E60BA-89D1-4BE1-8F17-91EF4887CC8C@oracle.com>


> On Apr 29, 2016, at 6:45 AM, Roland Schatz <roland.schatz at oracle.com> wrote:
> 
> Hi,
> 
> Please review this small change:
> 
> webrev: http://cr.openjdk.java.net/~rschatz/JDK-8155735/webrev.00/
> bug: https://bugs.openjdk.java.net/browse/JDK-8155735
> 
> This makes it possible to use the exception stubs without knowing about the hotspot-only concept of symbol pointers.

I think you need to handle the case where the symbol doesn?t exist.  lookup_symbol will return NULL if there?s no currently loaded symbol with that name which will lead to a SEGV later.  So you either need to throw an exception here or you should use 

TempNewSymbol symbol = SymbolTable::new_symbol

and let new_exception throw an exception if the class named by symbol doesn?t exist.

tom


> 
> Thanks,
> Roland


From dmitrij.pochepko at oracle.com  Fri Apr 29 15:29:34 2016
From: dmitrij.pochepko at oracle.com (dmitrij pochepko)
Date: Fri, 29 Apr 2016 18:29:34 +0300
Subject: RFR(XS): 8155163 - JVMCI:
	MethodHandleAccessProvider.resolveInvokeBasicTarget implementation
	doesn't match javadoc
In-Reply-To: <CCF91280-E377-4852-B49D-E60DBFE476D3@oracle.com>
References: <57221E11.4000406@oracle.com>
	<CCF91280-E377-4852-B49D-E60DBFE476D3@oracle.com>
Message-ID: <57237DDE.5000607@oracle.com>

Thank you!
> Looks good.
>
>> On Apr 28, 2016, at 4:28 AM, Dmitrij Pochepko 
>> <dmitrij.pochepko at oracle.com <mailto:dmitrij.pochepko at oracle.com>> wrote:
>>
>> Hi,
>>
>> please review fix for JDK-8155163 - JVMCI: 
>> MethodHandleAccessProvider.resolveInvokeBasicTarget implementation 
>> doesn't match javadoc
>>
>> MethodHandleAccessProvider.resolveInvokeBasicTarget implementation 
>> throws NPE in case 1st parameter doesn't represent MethodHandle, but 
>> according to javadoc it should return null in such case. So, a small 
>> fix should be applied to match javadoc.
>>
>> CR: https://bugs.openjdk.java.net/browse/JDK-8155163
>> Webrev: http://cr.openjdk.java.net/~dpochepk/8155163/webrev.01/
>>
>> I've tested this change on linux_x64 via jprt.
>>
>> Thanks,
>> Dmitrij
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/aecd89f4/attachment-0001.html>

From rwestrel at redhat.com  Fri Apr 29 15:49:12 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 29 Apr 2016 17:49:12 +0200
Subject: RFR(S): 8154943: AArch64: redundant address computation
	instructions with vectorization
In-Reply-To: <572267D5.8020605@oracle.com>
References: <dk64malofn1.fsf@rwestrel.remote.csb> <572267D5.8020605@oracle.com>
Message-ID: <57238278.5030607@redhat.com>

> node.cpp change is good.
> 
> compile.cpp I understand when you replace "similar" (same in(1)) node
> with it but it is not clear that you also processing users (whole
> following chain) to remove similar nodes. Add comment.
> I think the check "!(k->Opcode() == Op_ConvI2L || ... " (and use
> 'continue' instead of 'break') should be done when you push a node on
> the list wq.push(u).

Thanks for the review, Vladimir. What about this?

http://cr.openjdk.java.net/~roland/8154943/webrev.01/

Roland.

From roland.schatz at oracle.com  Fri Apr 29 15:55:26 2016
From: roland.schatz at oracle.com (Roland Schatz)
Date: Fri, 29 Apr 2016 17:55:26 +0200
Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI
	exception stubs
In-Reply-To: <B07E60BA-89D1-4BE1-8F17-91EF4887CC8C@oracle.com>
References: <5723657C.6020804@oracle.com>
	<B07E60BA-89D1-4BE1-8F17-91EF4887CC8C@oracle.com>
Message-ID: <572383EE.5060608@oracle.com>

On 04/29/2016 05:27 PM, Tom Rodriguez wrote:
> I think you need to handle the case where the symbol doesn?t exist.  lookup_symbol will return NULL if there?s no currently loaded symbol with that name which will lead to a SEGV later.  So you either need to throw an exception here or you should use
>
> TempNewSymbol symbol = SymbolTable::new_symbol

SymbolTable::lookup should add it if it doesn't exist.

> and let new_exception throw an exception if the class named by symbol doesn?t exist.

throw_and_post_jvmti_exception will throw a `NoClassDefFoundError` when 
the exception class doesn't exist.


Just to be sure, I tested this using graal. Temporarily passing in a 
wrong class name into the stub results in:
> java.lang.NoClassDefFoundError: bla/java/lang/NullPointerException
>         at 
> com.oracle.graal.replacements.test.CompiledNullPointerExceptionTest.testSnippet(CompiledNullPointerExceptionTest.java:93)
>         at jdk.vm.ci.hotspot.CompilerToVM.executeInstalledCode(Native 
> Method)
>         at 
> jdk.vm.ci.hotspot.HotSpotNmethod.executeVarargs(HotSpotNmethod.java:100)
>         at 
> com.oracle.graal.compiler.test.GraalCompilerTest.executeActual(GraalCompilerTest.java:578)
> [...]


- Roland


From rwestrel at redhat.com  Fri Apr 29 15:56:11 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 29 Apr 2016 17:56:11 +0200
Subject: [aarch64-port-dev ] RFR: 8155617: aarch64: ClearArray does not
	use DC ZVA
In-Reply-To: <1461943833.12456.16.camel@mint>
References: <1461851388.10531.27.camel@mint>
	<dk6a8kcecih.fsf@rwestrel.remote.csb> <1461943833.12456.16.camel@mint>
Message-ID: <5723841B.3030607@redhat.com>

Hi Ed,

> Could you please try the following patch

I tried your patch and the debug VM starts ok now.

Roland.

From dmitrij.pochepko at oracle.com  Fri Apr 29 16:00:52 2016
From: dmitrij.pochepko at oracle.com (dmitrij pochepko)
Date: Fri, 29 Apr 2016 19:00:52 +0300
Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant
	javadoc should be updated for null JavaKind case
In-Reply-To: <1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com>
References: <57223CC2.6000309@oracle.com>
	<1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com>
Message-ID: <57238534.20405@oracle.com>

Hi,
I've updated javadoc according to comment.

Please take a look at updated version:

http://cr.openjdk.java.net/~dpochepk/8155244/webrev.02/

Thanks,
Dmitrij

> + * @throws IllegalArgumentException if {@code kind} is {@code null}, 
> {@code kind} is {@link JavaKind#Void} or
> + * not {@linkplain JavaKind#isPrimitive() primitive} kind
> I don?t think you have to repeat {@code kind}.
>
>> On Apr 28, 2016, at 6:39 AM, Dmitrij Pochepko 
>> <dmitrij.pochepko at oracle.com <mailto:dmitrij.pochepko at oracle.com>> wrote:
>>
>> Hi,
>>
>> please review changes for 8155244 - JVMCI: 
>> MemoryAccessProvider.readUnsafeConstant javadoc should be updated for 
>> null JavaKind case
>>
>> MemoryAccessProvider.readUnsafeConstant method implementation throws 
>> NullPointerException in case JavaKind is null. Such behavior wasn't 
>> specified in javadoc.
>> So, a small fix contains logic to throw IllegalArgumentException and 
>> respective javadoc update.
>>
>> CR: https://bugs.openjdk.java.net/browse/JDK-8155244
>> webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ 
>> <http://cr.openjdk.java.net/%7Edpochepk/8155244/webrev.01/>
>>
>> I've tested fix on linux_x64 via jprt.
>>
>> Thanks,
>> Dmitrij
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/b2680baf/attachment.html>

From vladimir.kozlov at oracle.com  Fri Apr 29 16:06:39 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 29 Apr 2016 09:06:39 -0700
Subject: RFR(S): 8154943: AArch64: redundant address computation
	instructions with vectorization
In-Reply-To: <57238278.5030607@redhat.com>
References: <dk64malofn1.fsf@rwestrel.remote.csb>
	<572267D5.8020605@oracle.com> <57238278.5030607@redhat.com>
Message-ID: <71b64964-0682-2deb-c0d4-0102d7a8a939@oracle.com>

Yes, this looks good.

Thanks,
Vladimir

On 4/29/16 8:49 AM, Roland Westrelin wrote:
>> node.cpp change is good.
>>
>> compile.cpp I understand when you replace "similar" (same in(1)) node
>> with it but it is not clear that you also processing users (whole
>> following chain) to remove similar nodes. Add comment.
>> I think the check "!(k->Opcode() == Op_ConvI2L || ... " (and use
>> 'continue' instead of 'break') should be done when you push a node on
>> the list wq.push(u).
>
> Thanks for the review, Vladimir. What about this?
>
> http://cr.openjdk.java.net/~roland/8154943/webrev.01/
>
> Roland.
>

From vladimir.kozlov at oracle.com  Fri Apr 29 16:09:00 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 29 Apr 2016 09:09:00 -0700
Subject: RFR(XS): 8155717: Aarch64: enable loop superword's unrolling
	analysis
In-Reply-To: <57232234.7020504@redhat.com>
References: <5723201E.5080909@redhat.com> <572320F7.9030608@redhat.com>
	<57232234.7020504@redhat.com>
Message-ID: <5caad142-53f6-ed36-ac4e-e006c441bdc4@oracle.com>

Changes looks good. I will run tests with it and will sponsor.

Thanks,
Vladimir


On 4/29/16 1:58 AM, Roland Westrelin wrote:
>
>> OK, cool.  Will Michael sponsor this, then?
>
> Thanks for reviewing it. I don't think Michael has that power.
>
> Roland.
>

From vladimir.kozlov at oracle.com  Fri Apr 29 16:55:57 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 29 Apr 2016 09:55:57 -0700
Subject: RFR(XS): 8155738: C2: fix frame_complete_offset
In-Reply-To: <2768884fbe2647e3a2e6b82da36b4fe6@DEWDFE13DE09.global.corp.sap>
References: <2768884fbe2647e3a2e6b82da36b4fe6@DEWDFE13DE09.global.corp.sap>
Message-ID: <5b697841-5471-cee3-7136-7a334ee2ca25@oracle.com>

"In the debug build the call at output.cpp:1831 build if (n->size(_regalloc) < (current_offset-instr_offset)) overwrites 
the proper value."

So the fix is to not reset frame_complete_offset (Compile::_code_offsets._values[Frame_Complete]) when node is emitted 
into scratch buffer.

Changes looks good.

Thanks,
Vladimir

On 4/29/16 7:09 AM, Lindenmaier, Goetz wrote:
> Hi,
>
>
>
> In C2, frame_complete_offset is set wrong.  It?s also called during scratch_emit_size,
>
> which happens at least in the debug build.
>
> I also added the missing call on ppc.
>
>
>
> Please review this small fix.  I please need a sponsor:
>
> http://cr.openjdk.java.net/~goetz/wr16/8155738-frameCmplt/webrev.00/
>
>
>
> Thanks and best regards,
>
>   Goetz.
>

From vladimir.kozlov at oracle.com  Fri Apr 29 17:05:14 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 29 Apr 2016 10:05:14 -0700
Subject: RFR(S): 8155729: C2: Skip transformation of LoadConP for
	heap-based compressed oops
In-Reply-To: <41d17a7145394a0aa7e1ebcd890ccd73@DEWDFE13DE14.global.corp.sap>
References: <41d17a7145394a0aa7e1ebcd890ccd73@DEWDFE13DE14.global.corp.sap>
Message-ID: <9f7bd5c7-1f40-5129-ca80-3d83e035c4e0@oracle.com>

Looks good to me. We need corresponding changes in our closed code too.

Thanks,
Vladimir

On 4/29/16 7:06 AM, Doerr, Martin wrote:
> Hi,
>
>
>
> I have opened a new bug for the proposal which was discussed in thread 8154826.
>
> The summary is:
>
>
>
> C2's final graph reshaping performs the following transformation:
>
> Original pattern: LoadConP + Storage access
>
> Transformed pattern: LoadConN + DecodeN heap-based + Storage access
>
>
>
> This seems to be fine for simpler compressed oops mode. It also seems to be fine on x86 which can match the decoding
> into the operand of the Storage access instruction.
>
>
>
> Other platforms should better skip the transformation:
>
> -PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded.
>
> -SPARC can use that as well.
>
> -aarch64: LoadConN+DecodeN has a higher latency than LoadConP.
>
>
>
> We can always skip the transformation in heap-based compressed klass mode. Matching DecodeNKlass as operand is currently
> not implemented.
>
>
>
> Webrev is here:
>
> http://cr.openjdk.java.net/~mdoerr/8155729_LoadConP/webrev.00/
>
>
>
> Please review. I will also need a sponsor, please.
>
>
>
> Best regards,
>
> Martin
>
>
>

From christian.thalinger at oracle.com  Fri Apr 29 18:51:04 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 29 Apr 2016 08:51:04 -1000
Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant
	javadoc should be updated for null JavaKind case
In-Reply-To: <57238534.20405@oracle.com>
References: <57223CC2.6000309@oracle.com>
	<1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com>
	<57238534.20405@oracle.com>
Message-ID: <15DCAD37-3E4D-456E-A569-CB91070B4B69@oracle.com>

Looks good.

> On Apr 29, 2016, at 6:00 AM, dmitrij pochepko <dmitrij.pochepko at oracle.com> wrote:
> 
> Hi,
> I've updated javadoc according to comment.
> 
> Please take a look at updated version:
> 
> http://cr.openjdk.java.net/~dpochepk/8155244/webrev.02/ <http://cr.openjdk.java.net/~dpochepk/8155244/webrev.02/>
> 
> Thanks,
> Dmitrij
> 
>> +     * @throws IllegalArgumentException if {@code kind} is {@code null}, {@code kind} is {@link JavaKind#Void} or
>> +     *             not {@linkplain JavaKind#isPrimitive() primitive} kind
>> I don?t think you have to repeat {@code kind}.
>> 
>>> On Apr 28, 2016, at 6:39 AM, Dmitrij Pochepko <dmitrij.pochepko at oracle.com <mailto:dmitrij.pochepko at oracle.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> please review changes for 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case
>>> 
>>> MemoryAccessProvider.readUnsafeConstant method implementation throws NullPointerException in case JavaKind is null. Such behavior wasn't specified in javadoc.
>>> So, a small fix contains logic to throw IllegalArgumentException and respective javadoc update.
>>> 
>>> CR: https://bugs.openjdk.java.net/browse/JDK-8155244 <https://bugs.openjdk.java.net/browse/JDK-8155244>
>>> webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ <http://cr.openjdk.java.net/%7Edpochepk/8155244/webrev.01/>
>>> 
>>> I've tested fix on linux_x64 via jprt.
>>> 
>>> Thanks,
>>> Dmitrij
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/c4de9978/attachment.html>

From christian.thalinger at oracle.com  Fri Apr 29 19:29:14 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 29 Apr 2016 09:29:14 -1000
Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI
	exception stubs
In-Reply-To: <572383EE.5060608@oracle.com>
References: <5723657C.6020804@oracle.com>
	<B07E60BA-89D1-4BE1-8F17-91EF4887CC8C@oracle.com>
	<572383EE.5060608@oracle.com>
Message-ID: <8F968C97-90F0-4341-9D0F-B8A5A84748B8@oracle.com>


> On Apr 29, 2016, at 5:55 AM, Roland Schatz <roland.schatz at oracle.com> wrote:
> 
> On 04/29/2016 05:27 PM, Tom Rodriguez wrote:
>> I think you need to handle the case where the symbol doesn?t exist.  lookup_symbol will return NULL if there?s no currently loaded symbol with that name which will lead to a SEGV later.  So you either need to throw an exception here or you should use
>> 
>> TempNewSymbol symbol = SymbolTable::new_symbol
> 
> SymbolTable::lookup should add it if it doesn't exist.

new_symbol actually calls lookup but I think you?d still want the TempNewSymbol.  Runtime people would know better.

> 
>> and let new_exception throw an exception if the class named by symbol doesn?t exist.
> 
> throw_and_post_jvmti_exception will throw a `NoClassDefFoundError` when the exception class doesn't exist.
> 
> 
> Just to be sure, I tested this using graal. Temporarily passing in a wrong class name into the stub results in:
>> java.lang.NoClassDefFoundError: bla/java/lang/NullPointerException
>>        at com.oracle.graal.replacements.test.CompiledNullPointerExceptionTest.testSnippet(CompiledNullPointerExceptionTest.java:93)
>>        at jdk.vm.ci.hotspot.CompilerToVM.executeInstalledCode(Native Method)
>>        at jdk.vm.ci.hotspot.HotSpotNmethod.executeVarargs(HotSpotNmethod.java:100)
>>        at com.oracle.graal.compiler.test.GraalCompilerTest.executeActual(GraalCompilerTest.java:578)
>> [...]
> 
> 
> - Roland
> 


From dmitrij.pochepko at oracle.com  Fri Apr 29 19:49:05 2016
From: dmitrij.pochepko at oracle.com (dmitrij pochepko)
Date: Fri, 29 Apr 2016 22:49:05 +0300
Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant
	javadoc should be updated for null JavaKind case
In-Reply-To: <15DCAD37-3E4D-456E-A569-CB91070B4B69@oracle.com>
References: <57223CC2.6000309@oracle.com>
	<1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com>
	<57238534.20405@oracle.com>
	<15DCAD37-3E4D-456E-A569-CB91070B4B69@oracle.com>
Message-ID: <5723BAB1.8000904@oracle.com>

Thank you!

On 29.04.2016 21:51, Christian Thalinger wrote:
> Looks good.
>
>> On Apr 29, 2016, at 6:00 AM, dmitrij pochepko 
>> <dmitrij.pochepko at oracle.com <mailto:dmitrij.pochepko at oracle.com>> wrote:
>>
>> Hi,
>> I've updated javadoc according to comment.
>>
>> Please take a look at updated version:
>>
>> http://cr.openjdk.java.net/~dpochepk/8155244/webrev.02/
>>
>> Thanks,
>> Dmitrij
>>
>>> + * @throws IllegalArgumentException if {@code kind} is {@code 
>>> null}, {@code kind} is {@link JavaKind#Void} or
>>> + * not {@linkplain JavaKind#isPrimitive() primitive} kind
>>> I don?t think you have to repeat {@code kind}.
>>>
>>>> On Apr 28, 2016, at 6:39 AM, Dmitrij Pochepko 
>>>> <dmitrij.pochepko at oracle.com <mailto:dmitrij.pochepko at oracle.com>> 
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> please review changes for 8155244 - JVMCI: 
>>>> MemoryAccessProvider.readUnsafeConstant javadoc should be updated 
>>>> for null JavaKind case
>>>>
>>>> MemoryAccessProvider.readUnsafeConstant method implementation 
>>>> throws NullPointerException in case JavaKind is null. Such behavior 
>>>> wasn't specified in javadoc.
>>>> So, a small fix contains logic to throw IllegalArgumentException 
>>>> and respective javadoc update.
>>>>
>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8155244
>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ 
>>>> <http://cr.openjdk.java.net/%7Edpochepk/8155244/webrev.01/>
>>>>
>>>> I've tested fix on linux_x64 via jprt.
>>>>
>>>> Thanks,
>>>> Dmitrij
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/218065c4/attachment.html>

From tom.rodriguez at oracle.com  Fri Apr 29 19:49:22 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 29 Apr 2016 12:49:22 -0700
Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI
	exception stubs
In-Reply-To: <572383EE.5060608@oracle.com>
References: <5723657C.6020804@oracle.com>
	<B07E60BA-89D1-4BE1-8F17-91EF4887CC8C@oracle.com>
	<572383EE.5060608@oracle.com>
Message-ID: <BBCA456B-8A0F-4A19-B30C-B8C622D35345@oracle.com>


> On Apr 29, 2016, at 8:55 AM, Roland Schatz <roland.schatz at oracle.com> wrote:
> 
> On 04/29/2016 05:27 PM, Tom Rodriguez wrote:
>> I think you need to handle the case where the symbol doesn?t exist.  lookup_symbol will return NULL if there?s no currently loaded symbol with that name which will lead to a SEGV later.  So you either need to throw an exception here or you should use
>> 
>> TempNewSymbol symbol = SymbolTable::new_symbol
> 
> SymbolTable::lookup should add it if it doesn't exist.

Sorry I was looking at the comment below lookup instead of above it.  It?s very odd that both of these exist though and new_symbol just forwards to lookup.

  static Symbol* new_symbol(const char* utf8_buffer, int length, TRAPS) {
  static Symbol* lookup(const char* name, int len, TRAPS);

I do think you want the TempNewSymbol though.

> 
>> and let new_exception throw an exception if the class named by symbol doesn?t exist.
> 
> throw_and_post_jvmti_exception will throw a `NoClassDefFoundError` when the exception class doesn't exist.
> 
> 
> Just to be sure, I tested this using graal. Temporarily passing in a wrong class name into the stub results in:
>> java.lang.NoClassDefFoundError: bla/java/lang/NullPointerException
>>        at com.oracle.graal.replacements.test.CompiledNullPointerExceptionTest.testSnippet(CompiledNullPointerExceptionTest.java:93)
>>        at jdk.vm.ci.hotspot.CompilerToVM.executeInstalledCode(Native Method)
>>        at jdk.vm.ci.hotspot.HotSpotNmethod.executeVarargs(HotSpotNmethod.java:100)
>>        at com.oracle.graal.compiler.test.GraalCompilerTest.executeActual(GraalCompilerTest.java:578)
>> [?]

Thanks for checking.

tom

> 
> 
> - Roland
> 


From dean.long at oracle.com  Fri Apr 29 20:51:15 2016
From: dean.long at oracle.com (Dean Long)
Date: Fri, 29 Apr 2016 13:51:15 -0700
Subject: RFR(XS): 8155738: C2: fix frame_complete_offset
In-Reply-To: <5b697841-5471-cee3-7136-7a334ee2ca25@oracle.com>
References: <2768884fbe2647e3a2e6b82da36b4fe6@DEWDFE13DE09.global.corp.sap>
	<5b697841-5471-cee3-7136-7a334ee2ca25@oracle.com>
Message-ID: <f5a84e06-e190-2c57-3791-696ff2ce9390@oracle.com>

Looks good.  I will closed JDK-8008415 as a duplicate!

dl


On 4/29/2016 9:55 AM, Vladimir Kozlov wrote:
> "In the debug build the call at output.cpp:1831 build if 
> (n->size(_regalloc) < (current_offset-instr_offset)) overwrites the 
> proper value."
>
> So the fix is to not reset frame_complete_offset 
> (Compile::_code_offsets._values[Frame_Complete]) when node is emitted 
> into scratch buffer.
>
> Changes looks good.
>
> Thanks,
> Vladimir
>
> On 4/29/16 7:09 AM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>>
>>
>> In C2, frame_complete_offset is set wrong.  It?s also called during 
>> scratch_emit_size,
>>
>> which happens at least in the debug build.
>>
>> I also added the missing call on ppc.
>>
>>
>>
>> Please review this small fix.  I please need a sponsor:
>>
>> http://cr.openjdk.java.net/~goetz/wr16/8155738-frameCmplt/webrev.00/
>>
>>
>>
>> Thanks and best regards,
>>
>>   Goetz.
>>


From tom.rodriguez at oracle.com  Fri Apr 29 22:04:21 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 29 Apr 2016 15:04:21 -0700
Subject: RFR 8155771: [JVMCI] expose JVM_ACC_IS_CLONEABLE_FAST
Message-ID: <579B4981-C0D1-4177-9DED-8BCD6C233395@oracle.com>

http://cr.openjdk.java.net/~never/8155771/webrev

The new flag JVM_ACC_IS_CLONEABLE_FAST controls whether an object can be cloned by simply performing an allocation and copying the fields. Previously all Cloneable objects were assumed to be cloneable in this way.  Add ResolvedJavaType.isAllocationCloneable to expose this notion.  Tested with Graal.

tom

From aleksey.shipilev at oracle.com  Fri Apr 29 22:12:29 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Sat, 30 Apr 2016 01:12:29 +0300
Subject: RFR (M) 8155739: [TESTBUG] VarHandles/Unsafe tests for weakCAS should
	allow spurious failures
Message-ID: <5723DC4D.8090507@oracle.com>

Hi,

I would like to fix a simple testbug in our weakCompareAndSet VarHandles
and Unsafe intrinsics tests. weakCompareAndSet is spec-ed to allow
spurious failures, but current tests do not allow that. This blocks
development and testing on non-x86 platforms.

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8155739

Webrevs:
  http://cr.openjdk.java.net/~shade/8155739/webrev.hs.00/
  http://cr.openjdk.java.net/~shade/8155739/webrev.jdk.00/

The tests are auto-generated, and the substantiative changes are in
*.template files. I also removed obsolete generate-unsafe-tests.sh. I
would like to push through hs-comp to expose this to Power and AArch64
folks early.

Testing: x86_64, jdk:java/lang/invoke/VarHandle, hotspot:compiler/unsafe

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160430/4ee51827/signature.asc>

From christian.thalinger at oracle.com  Fri Apr 29 23:10:52 2016
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 29 Apr 2016 13:10:52 -1000
Subject: RFR 8155771: [JVMCI] expose JVM_ACC_IS_CLONEABLE_FAST
In-Reply-To: <579B4981-C0D1-4177-9DED-8BCD6C233395@oracle.com>
References: <579B4981-C0D1-4177-9DED-8BCD6C233395@oracle.com>
Message-ID: <0C76F8C6-012E-48F4-934D-4BFB9DC241A3@oracle.com>


> On Apr 29, 2016, at 12:04 PM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~never/8155771/webrev
+    boolean isAllocationCloneable();
That means ?allocation-cloneable?, not ?allocation is cloneable?, right?  It could be interpreted both ways but the change is good.

> 
> The new flag JVM_ACC_IS_CLONEABLE_FAST controls whether an object can be cloned by simply performing an allocation and copying the fields. Previously all Cloneable objects were assumed to be cloneable in this way.  Add ResolvedJavaType.isAllocationCloneable to expose this notion.  Tested with Graal.
> 
> tom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/50cbd4dd/attachment.html>

From vladimir.x.ivanov at oracle.com  Fri Apr 29 23:11:49 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 30 Apr 2016 02:11:49 +0300
Subject: [9] RFR (XS): 8155635: C2: assert(flat != TypePtr::BOTTOM) failed:
	cannot alias-analyze an untyped ptr
Message-ID: <5723EA35.5020809@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8155635/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8155635

SplitIf transformation can produce untyped pointers when slitting AddP 
nodes for unsafe accesses through a Phi which merges non-null & null values:
     AddP ... (Phi (ConP #NULL) (CheckCastPP Oop:...:NotNull))

Proposed fix is to enable oop pointer to raw pointer conversion for 
absolute addresses.

I also experimented with blocking SplitIf transformation is such cases, 
but the transformation seems viable and considerably simplifies the 
graph: X-shaped control flow is untangled by eliminating redundant and 
the transformation sharpens types on both branches.

I checked specifically how Phi merges raw & oop pointers after the split 
and it works fine.

Testing: failing test, JPRT, RBT (hs-tier0-comp.js).

Thanks!

Best regards,
Vladimir Ivanov

PS: though AddP (Phi #NULL #NotNull) shape is common, I wasn't able to 
write a simplified test case which triggers SplitIf transformation.

From tom.rodriguez at oracle.com  Fri Apr 29 23:36:35 2016
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 29 Apr 2016 16:36:35 -0700
Subject: RFR 8155771: [JVMCI] expose JVM_ACC_IS_CLONEABLE_FAST
In-Reply-To: <0C76F8C6-012E-48F4-934D-4BFB9DC241A3@oracle.com>
References: <579B4981-C0D1-4177-9DED-8BCD6C233395@oracle.com>
	<0C76F8C6-012E-48F4-934D-4BFB9DC241A3@oracle.com>
Message-ID: <582A1B37-8F4C-4151-8E0C-D77E73236C93@oracle.com>


> On Apr 29, 2016, at 4:10 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
> 
>> On Apr 29, 2016, at 12:04 PM, Tom Rodriguez <tom.rodriguez at oracle.com <mailto:tom.rodriguez at oracle.com>> wrote:
>> 
>> http://cr.openjdk.java.net/~never/8155771/webrev <http://cr.openjdk.java.net/~never/8155771/webrev>+    boolean isAllocationCloneable();
> That means ?allocation-cloneable?, not ?allocation is cloneable?, right?  It could be interpreted both ways but the change is good.

I knew what I meant so I didn?t see alternative readings.  Is there something better? canBeClonedWithAllocation :)

tom

> 
>> 
>> The new flag JVM_ACC_IS_CLONEABLE_FAST controls whether an object can be cloned by simply performing an allocation and copying the fields. Previously all Cloneable objects were assumed to be cloneable in this way.  Add ResolvedJavaType.isAllocationCloneable to expose this notion.  Tested with Graal.
>> 
>> tom
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/21109313/attachment.html>

From vladimir.kozlov at oracle.com  Fri Apr 29 23:39:13 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 29 Apr 2016 16:39:13 -0700
Subject: [9] RFR (XS): 8155635: C2: assert(flat != TypePtr::BOTTOM)
	failed: cannot alias-analyze an untyped ptr
In-Reply-To: <5723EA35.5020809@oracle.com>
References: <5723EA35.5020809@oracle.com>
Message-ID: <99d03b20-bdb4-5948-89aa-eced57b279af@oracle.com>

I am not comfortable with this fix. You may replace in(Base) != NULL with TOP.
Also it should not be RAW pointer (TOP as Base) if it is created by graph transformation from normal oop pointer.
I think we should track which pointers are really RAW when creating them.

Can you explain why we have such graph shape where we access memory after a merge point and on one merged path has NULL 
as pointer to object. There should be NULL check after merge before memory access in such case.

Thanks,
Vladimir K

On 4/29/16 4:11 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8155635/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8155635
>
> SplitIf transformation can produce untyped pointers when slitting AddP nodes for unsafe accesses through a Phi which
> merges non-null & null values:
>     AddP ... (Phi (ConP #NULL) (CheckCastPP Oop:...:NotNull))
>
> Proposed fix is to enable oop pointer to raw pointer conversion for absolute addresses.
>
> I also experimented with blocking SplitIf transformation is such cases, but the transformation seems viable and
> considerably simplifies the graph: X-shaped control flow is untangled by eliminating redundant and the transformation
> sharpens types on both branches.
>
> I checked specifically how Phi merges raw & oop pointers after the split and it works fine.
>
> Testing: failing test, JPRT, RBT (hs-tier0-comp.js).
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov
>
> PS: though AddP (Phi #NULL #NotNull) shape is common, I wasn't able to write a simplified test case which triggers
> SplitIf transformation.

From paul.sandoz at oracle.com  Fri Apr 29 23:42:43 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Fri, 29 Apr 2016 16:42:43 -0700
Subject: RFR (M) 8155739: [TESTBUG] VarHandles/Unsafe tests for weakCAS
	should allow spurious failures
In-Reply-To: <5723DC4D.8090507@oracle.com>
References: <5723DC4D.8090507@oracle.com>
Message-ID: <76809325-CB53-499E-885B-9ED4AB0DBBB5@oracle.com>


> On 29 Apr 2016, at 15:12, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
> 
> Hi,
> 
> I would like to fix a simple testbug in our weakCompareAndSet VarHandles
> and Unsafe intrinsics tests. weakCompareAndSet is spec-ed to allow
> spurious failures, but current tests do not allow that. This blocks
> development and testing on non-x86 platforms.
> 
> Bug:
>  https://bugs.openjdk.java.net/browse/JDK-8155739
> 
> Webrevs:
>  http://cr.openjdk.java.net/~shade/8155739/webrev.hs.00/
>  http://cr.openjdk.java.net/~shade/8155739/webrev.jdk.00/
> 

Looks good.

Small tweak if you so wish to do so:

#if[JdkInternalMisc]
    static final int WEAK_ATTEMPTS = Integer.getInteger("weakAttempts", 10);
#end[JdkInternalMisc]

which avoids changes to the SunMiscUnsafe* tests.

Paul.

> The tests are auto-generated, and the substantiative changes are in
> *.template files. I also removed obsolete generate-unsafe-tests.sh. I
> would like to push through hs-comp to expose this to Power and AArch64
> folks early.
> 
> Testing: x86_64, jdk:java/lang/invoke/VarHandle, hotspot:compiler/unsafe
> 
> Thanks,
> -Aleksey
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/6af258ea/signature-0001.asc>

From tomasz.wojtowicz at intel.com  Fri Apr 29 23:57:41 2016
From: tomasz.wojtowicz at intel.com (Wojtowicz, Tomasz)
Date: Fri, 29 Apr 2016 23:57:41 +0000
Subject: RFR (M): 8154974: AVX-512 equipped inflate, has_negatives &
	compress intrinsics
Message-ID: <3616187E21868C40AD1B36D41D29F4C1520B04B1@FMSMSX106.amr.corp.intel.com>

I would like to contribute following change:
Review details
Review Title:      AVX-512 equipped inflate, has_negatives & compress intrinsics
Review ID:          #8154974
Diff:                       http://cr.openjdk.java.net/~vdeshpande/8154974/webrev.00/
Description:        512 bit wide upgrades for previously 128-256 wide implementations using mask registers for tail computations.
Link: https://bugs.openjdk.java.net/browse/JDK-8154974
Author:                Tomasz, Wojtowicz

--
Thank you,

Tomek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160429/bf9dbec8/attachment.html>

From vladimir.x.ivanov at oracle.com  Sat Apr 30 00:24:48 2016
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 30 Apr 2016 03:24:48 +0300
Subject: [9] RFR (XS): 8155635: C2: assert(flat != TypePtr::BOTTOM)
	failed: cannot alias-analyze an untyped ptr
In-Reply-To: <99d03b20-bdb4-5948-89aa-eced57b279af@oracle.com>
References: <5723EA35.5020809@oracle.com>
	<99d03b20-bdb4-5948-89aa-eced57b279af@oracle.com>
Message-ID: <5723FB50.6050301@oracle.com>

Thanks for the feedback, Vladimir.

On 4/30/16 2:39 AM, Vladimir Kozlov wrote:
> I am not comfortable with this fix. You may replace in(Base) != NULL
> with TOP.
Do you see any cases when it is possible?
I don't see any sense in an absolute address with a valid heap base.

I can check that in(Base) == in(Address) == NULL. It will fix the 
problem as well.

> Also it should not be RAW pointer (TOP as Base) if it is created by
> graph transformation from normal oop pointer.
> I think we should track which pointers are really RAW when creating them.
>
> Can you explain why we have such graph shape where we access memory
> after a merge point and on one merged path has NULL as pointer to
> object. There should be NULL check after merge before memory access in
> such case.
It's not necessarily a normal oop pointer. Double-register addressing 
mode is the source of such shapes. Consider the following example:

   Object o = (flag ? INSTANCE : null);
   long off = (flag ? F_OFFSET : ADDR);
   UNSAFE.getLong(o, off);

is translated into:

   LoadL mem (AddP (Phi #NULL #NonNull) off)

If such AddP is split through the Phi, it turns into (AddP #NULL #NULL 
off) and (AddP #NonNull #NonNull off). The former is untyped and causes 
problems later.

What I can't replicate is how X-shaped control flow eligible for SplitIf 
transformation is produced.

In the failing case, initial null & exact type checks of an oop local 
(on OSR entry) merge into redundant X-shaped block. Unsafe accesses uses 
the local as a base later.

Best regards,
Vladimir Ivanov

>
> On 4/29/16 4:11 PM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8155635/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8155635
>>
>> SplitIf transformation can produce untyped pointers when slitting AddP
>> nodes for unsafe accesses through a Phi which
>> merges non-null & null values:
>>     AddP ... (Phi (ConP #NULL) (CheckCastPP Oop:...:NotNull))
>>
>> Proposed fix is to enable oop pointer to raw pointer conversion for
>> absolute addresses.
>>
>> I also experimented with blocking SplitIf transformation is such
>> cases, but the transformation seems viable and
>> considerably simplifies the graph: X-shaped control flow is untangled
>> by eliminating redundant and the transformation
>> sharpens types on both branches.
>>
>> I checked specifically how Phi merges raw & oop pointers after the
>> split and it works fine.
>>
>> Testing: failing test, JPRT, RBT (hs-tier0-comp.js).
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> PS: though AddP (Phi #NULL #NotNull) shape is common, I wasn't able to
>> write a simplified test case which triggers
>> SplitIf transformation.

From vladimir.kozlov at oracle.com  Sat Apr 30 01:30:28 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 29 Apr 2016 18:30:28 -0700
Subject: [9] RFR (XS): 8155635: C2: assert(flat != TypePtr::BOTTOM)
	failed: cannot alias-analyze an untyped ptr
In-Reply-To: <5723FB50.6050301@oracle.com>
References: <5723EA35.5020809@oracle.com>
	<99d03b20-bdb4-5948-89aa-eced57b279af@oracle.com>
	<5723FB50.6050301@oracle.com>
Message-ID: <57240AB4.8040406@oracle.com>

On 4/29/16 5:24 PM, Vladimir Ivanov wrote:
> Thanks for the feedback, Vladimir.
>
> On 4/30/16 2:39 AM, Vladimir Kozlov wrote:
>> I am not comfortable with this fix. You may replace in(Base) != NULL
>> with TOP.
> Do you see any cases when it is possible?
> I don't see any sense in an absolute address with a valid heap base.
>
> I can check that in(Base) == in(Address) == NULL. It will fix the problem as well.
>
>> Also it should not be RAW pointer (TOP as Base) if it is created by
>> graph transformation from normal oop pointer.
>> I think we should track which pointers are really RAW when creating them.
>>
>> Can you explain why we have such graph shape where we access memory
>> after a merge point and on one merged path has NULL as pointer to
>> object. There should be NULL check after merge before memory access in
>> such case.
> It's not necessarily a normal oop pointer. Double-register addressing mode is the source of such shapes. Consider the following example:
>
>    Object o = (flag ? INSTANCE : null);
>    long off = (flag ? F_OFFSET : ADDR);
>    UNSAFE.getLong(o, off);

I think for this graph shape C2 type system gave up and drops type to general Ptr::BOTTOM because it does not know that 'off' can be address and not a normal offset on dead path where base is NULL. We 
usually do:

long l = flag ? o.field : UNSAFE.getLong(addr);
And for UNSAFE.getLong(addr) we generate Raw pointer address (make_unsafe_address()).

What I am saying is that C2 treat long value as address only when it was used as direct parameter for unsafe. See LibraryCallKit::classify_unsafe_addr().

Who produces ADDR? May be we can't set flag to indicate that it is RAW address.

We should discuss it with John who is *the* expert in this.

Thanks,
Vladimir

>
> is translated into:
>
>    LoadL mem (AddP (Phi #NULL #NonNull) off)
>
> If such AddP is split through the Phi, it turns into (AddP #NULL #NULL off) and (AddP #NonNull #NonNull off). The former is untyped and causes problems later.
>
> What I can't replicate is how X-shaped control flow eligible for SplitIf transformation is produced.
>
> In the failing case, initial null & exact type checks of an oop local (on OSR entry) merge into redundant X-shaped block. Unsafe accesses uses the local as a base later.
>
> Best regards,
> Vladimir Ivanov
>
>>
>> On 4/29/16 4:11 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8155635/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8155635
>>>
>>> SplitIf transformation can produce untyped pointers when slitting AddP
>>> nodes for unsafe accesses through a Phi which
>>> merges non-null & null values:
>>>     AddP ... (Phi (ConP #NULL) (CheckCastPP Oop:...:NotNull))
>>>
>>> Proposed fix is to enable oop pointer to raw pointer conversion for
>>> absolute addresses.
>>>
>>> I also experimented with blocking SplitIf transformation is such
>>> cases, but the transformation seems viable and
>>> considerably simplifies the graph: X-shaped control flow is untangled
>>> by eliminating redundant and the transformation
>>> sharpens types on both branches.
>>>
>>> I checked specifically how Phi merges raw & oop pointers after the
>>> split and it works fine.
>>>
>>> Testing: failing test, JPRT, RBT (hs-tier0-comp.js).
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> PS: though AddP (Phi #NULL #NotNull) shape is common, I wasn't able to
>>> write a simplified test case which triggers
>>> SplitIf transformation.

From edward.nevill at gmail.com  Sat Apr 30 08:17:29 2016
From: edward.nevill at gmail.com (Edward Nevill)
Date: Sat, 30 Apr 2016 09:17:29 +0100
Subject: 8155790: aarch64: debug VM fails to start after 8155617
Message-ID: <1462004249.32168.11.camel@mint>

Hi,

Please review the following webrev

http://cr.openjdk.java.net/~enevill/8155790/

JIRA: https://bugs.openjdk.java.net/browse/JDK-8155790


This fixes an issue where the debug build fails to start after 8155617 with the error

# Internal Error (/scratch/rwestrel/hs-comp/hotspot/src/share/vm/runtime/stubRoutines.cpp:344), pid=16911, tid=16912 
# assert(s.body[i] == 32) failed: what? 

The problem is that I failed to observe that the base register must point immediately after the end of the block being zeroed on exit from the zero loop.

In addition the base register can sometimes be byte aligned due to vectorisation and this was not handled.

The above webrev fixes this. I have tested a debug build with jtreg hotspot and langtools.

OK to push?

Ed.


From aph at redhat.com  Sat Apr 30 08:39:57 2016
From: aph at redhat.com (Andrew Haley)
Date: Sat, 30 Apr 2016 09:39:57 +0100
Subject: 8155790: aarch64: debug VM fails to start after 8155617
In-Reply-To: <1462004249.32168.11.camel@mint>
References: <1462004249.32168.11.camel@mint>
Message-ID: <57246F5D.7090609@redhat.com>

On 30/04/16 09:17, Edward Nevill wrote:
> http://cr.openjdk.java.net/~enevill/8155790/
> 
> JIRA: https://bugs.openjdk.java.net/browse/JDK-8155790

OK, thanks.

Andrew.