From rwestrel at redhat.com  Sun Dec  1 14:54:49 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Sun, 01 Dec 2019 15:54:49 +0100
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <871ru2diu9.fsf@redhat.com>
References: <871ru2diu9.fsf@redhat.com>
Message-ID: <8736e4ayfa.fsf@redhat.com>


> http://cr.openjdk.java.net/~roland/8234350/webrev.00/

Anyone for a second review?

Thanks,
Roland.


From patrick at os.amperecomputing.com  Mon Dec  2 03:18:58 2019
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Mon, 2 Dec 2019 03:18:58 +0000
Subject: RFR (trivial): 8234228: AArch64: Clean up redundant temp vars in
 generate_compare_long_string_different_encoding
In-Reply-To: <MN2PR01MB60936ED9EF6FAE8A493694508F700@MN2PR01MB6093.prod.exchangelabs.com>
References: <MN2PR01MB60936ED9EF6FAE8A493694508F700@MN2PR01MB6093.prod.exchangelabs.com>
Message-ID: <MN2PR01MB6093C8A66DC15F6FDA903FDE8F430@MN2PR01MB6093.prod.exchangelabs.com>

Ping...

Please help review, thanks.

Regards
Patrick

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Patrick Zhang OS
Sent: Friday, November 15, 2019 6:54 PM
To: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
Subject: RFR (trivial): 8234228: AArch64: Clean up redundant temp vars in generate_compare_long_string_different_encoding

Hi Reviewers,

This is a simple patch which cleans up some redundant temp vars and related instructions in generate_compare_long_string_different_encoding.

JBS: https://bugs.openjdk.java.net/browse/JDK-8234228 
Webrev: http://cr.openjdk.java.net/~qpzhang/8234228/webrev.01  

In generate_compare_long_string_different_encoding, the two Register vars strU and strL were used to record the pointers of the last 4 characters for the final comparisons. strU has been no use since the latest code updates as the chars got pre-loaded (r12) by compare_string_16_x_LU early, and strL is redundant too since the pointer is available in r11. Cleaning up these can save two add, two temp vars, and replace two sub with mov.
In addition, r10 in compare_string_16_x_LU is not used, cleaned the temp var too.

Tested jtreg tier1, and hotspot runtime/compiler, no new failures found.
Double checked with string intrinsics cases under [1], no regression found.
Ran [2] CompareToBench LU/UL as performance check, no regression found, and slight gains with some input sizes

[1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string 
[2] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar 

Regards
Patrick


From tobias.hartmann at oracle.com  Mon Dec  2 05:57:01 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 2 Dec 2019 06:57:01 +0100
Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to
 missing narrowing conversion
In-Reply-To: <d3045fa5-e6dc-505b-6cce-7098aaff0656@oracle.com>
References: <d317e7aa-92fa-c726-40dd-c004302a8b9f@oracle.com>
 <d3045fa5-e6dc-505b-6cce-7098aaff0656@oracle.com>
Message-ID: <d2d43662-238b-0255-6119-275a3d29c095@oracle.com>

Thanks Vladimir!

Best regards,
Tobias

On 29.11.19 15:19, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> Writing an (integer) value to a boolean, byte, char or short field includes an implicit narrowing
>> conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit field loads by caching
>> and reusing the last written value. The problem is that this value is not necessarily converted to
>> the field type and we end up using an incorrect value.
>>
>> For example, for the field store/load in testShort, C1 emits:
>> ?? [...]
>> ?? 0x00007f0fc582bd6c:?? mov??? %dx,0x12(%rsi)
>> ?? 0x00007f0fc582bd70:?? mov??? %rdx,%rax
>> ?? [...]
>>
>> The field load has been eliminated and the non-converted integer value (%rdx) is returned.
>>
>> The fix is to emit an explicit conversion to get the correct field value after the write:
>> ?? [...]
>> ?? 0x00007ff07982bd6c:?? mov??? %dx,0x12(%rsi)
>> ?? 0x00007ff07982bd70:?? movswl %dx,%edx
>> ?? 0x00007ff07982bd73:?? mov??? %rdx,%rax
>> ?? [...]
>>
>> Thanks,
>> Tobias
>>
>> [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.putfield
>>

From nick.gasson at arm.com  Mon Dec  2 07:55:14 2019
From: nick.gasson at arm.com (Nick Gasson)
Date: Mon, 2 Dec 2019 15:55:14 +0800
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
 <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
Message-ID: <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com>

On 29/11/2019 18:10, Andrew Haley wrote:
>> How about we exit with a fatal error if we can't find a suitably aligned
>> region? Then we can remove the code in decode_klass_non_null that uses
>> R27 and this patch is much simpler. That code path is poorly tested at
>> the moment so it seems risky to leave it in. With a hard error at least
>> users will report it to us so we can fix it.
> 
> That is starting to sound very attractive. With a 64-bit address space I'm
> finding it very hard to imagine a scenario in which we don't find a
> suitable address. I think AOT-compiled code would still be OK, because it
> generates different code, but we'd have to do some testing.
> 

There's another little wrinkle: even after updating the shared metaspace 
to use the search algorithm to find a 4G-aligned location, we can still 
end up with something like:

CompressedKlassPointers::base() => 0x1100000000
CompressedKlassPointers::shift() => 3

(The shift is always set to LogKlassAlignmentInBytes when CDS is on.)

Here we can't use EOR because 0x1100000000 doesn't fit in the immediate, 
and we can't use MOVK because the shift is non-zero.

I think the solution is to adjust the algorithm to search in increments 
of (4 << LogKlassAlignmentInBytes)*G once we hit 32G (the point at which 
we can no longer use a zero base). Then we can decode like this:

     const uint64_t shifted_base =
       (uint64_t)CompressedKlassPointers::base() >> 
CompressedKlassPointers::shift();
     guarantee((shifted_base & 0xffffffff) == 0, "compressed class base 
bad alignment");

     if (dst != src) movw(dst, src);
     movk(dst, shifted_base >> 32, 32);

     if (CompressedKlassPointers::shift() != 0) {
       lsl(dst, dst, LogKlassAlignmentInBytes);
     }

What do you think?

I'll do some testing with AOT later.


Thanks,
Nick

From martin.doerr at sap.com  Mon Dec  2 10:14:23 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 2 Dec 2019 10:14:23 +0000
Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to
 missing narrowing conversion
In-Reply-To: <d2d43662-238b-0255-6119-275a3d29c095@oracle.com>
References: <d317e7aa-92fa-c726-40dd-c004302a8b9f@oracle.com>
 <d3045fa5-e6dc-505b-6cce-7098aaff0656@oracle.com>
 <d2d43662-238b-0255-6119-275a3d29c095@oracle.com>
Message-ID: <AM0PR0202MB329752B956B396F701D41AC89A430@AM0PR0202MB3297.eurprd02.prod.outlook.com>

+1

Best regards,
Martin

> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Tobias Hartmann
> Sent: Montag, 2. Dezember 2019 06:57
> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; hotspot compiler
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: [14] RFR(S): 8234617: C1: Incorrect result of field load due to
> missing narrowing conversion
> 
> Thanks Vladimir!
> 
> Best regards,
> Tobias
> 
> On 29.11.19 15:19, Vladimir Ivanov wrote:
> >
> >> http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/
> >
> > Looks good.
> >
> > Best regards,
> > Vladimir Ivanov
> >
> >>
> >> Writing an (integer) value to a boolean, byte, char or short field includes
> an implicit narrowing
> >> conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit
> field loads by caching
> >> and reusing the last written value. The problem is that this value is not
> necessarily converted to
> >> the field type and we end up using an incorrect value.
> >>
> >> For example, for the field store/load in testShort, C1 emits:
> >> ?? [...]
> >> ?? 0x00007f0fc582bd6c:?? mov??? %dx,0x12(%rsi)
> >> ?? 0x00007f0fc582bd70:?? mov??? %rdx,%rax
> >> ?? [...]
> >>
> >> The field load has been eliminated and the non-converted integer value
> (%rdx) is returned.
> >>
> >> The fix is to emit an explicit conversion to get the correct field value after
> the write:
> >> ?? [...]
> >> ?? 0x00007ff07982bd6c:?? mov??? %dx,0x12(%rsi)
> >> ?? 0x00007ff07982bd70:?? movswl %dx,%edx
> >> ?? 0x00007ff07982bd73:?? mov??? %rdx,%rax
> >> ?? [...]
> >>
> >> Thanks,
> >> Tobias
> >>
> >> [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-
> 6.html#jvms-6.5.putfield
> >>

From tobias.hartmann at oracle.com  Mon Dec  2 11:28:31 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 2 Dec 2019 12:28:31 +0100
Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to
 missing narrowing conversion
In-Reply-To: <AM0PR0202MB329752B956B396F701D41AC89A430@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <d317e7aa-92fa-c726-40dd-c004302a8b9f@oracle.com>
 <d3045fa5-e6dc-505b-6cce-7098aaff0656@oracle.com>
 <d2d43662-238b-0255-6119-275a3d29c095@oracle.com>
 <AM0PR0202MB329752B956B396F701D41AC89A430@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <f81780f8-0905-1504-ec36-3d6c1e11db58@oracle.com>

Thanks Martin!

Best regards,
Tobias

On 02.12.19 11:14, Doerr, Martin wrote:
> +1
> 
> Best regards,
> Martin
> 
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> bounces at openjdk.java.net> On Behalf Of Tobias Hartmann
>> Sent: Montag, 2. Dezember 2019 06:57
>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; hotspot compiler
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: [14] RFR(S): 8234617: C1: Incorrect result of field load due to
>> missing narrowing conversion
>>
>> Thanks Vladimir!
>>
>> Best regards,
>> Tobias
>>
>> On 29.11.19 15:19, Vladimir Ivanov wrote:
>>>
>>>> http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/
>>>
>>> Looks good.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>>
>>>> Writing an (integer) value to a boolean, byte, char or short field includes
>> an implicit narrowing
>>>> conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit
>> field loads by caching
>>>> and reusing the last written value. The problem is that this value is not
>> necessarily converted to
>>>> the field type and we end up using an incorrect value.
>>>>
>>>> For example, for the field store/load in testShort, C1 emits:
>>>> ?? [...]
>>>> ?? 0x00007f0fc582bd6c:?? mov??? %dx,0x12(%rsi)
>>>> ?? 0x00007f0fc582bd70:?? mov??? %rdx,%rax
>>>> ?? [...]
>>>>
>>>> The field load has been eliminated and the non-converted integer value
>> (%rdx) is returned.
>>>>
>>>> The fix is to emit an explicit conversion to get the correct field value after
>> the write:
>>>> ?? [...]
>>>> ?? 0x00007ff07982bd6c:?? mov??? %dx,0x12(%rsi)
>>>> ?? 0x00007ff07982bd70:?? movswl %dx,%edx
>>>> ?? 0x00007ff07982bd73:?? mov??? %rdx,%rax
>>>> ?? [...]
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>> [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-
>> 6.html#jvms-6.5.putfield
>>>>

From markus.gronlund at oracle.com  Mon Dec  2 11:57:38 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Mon, 2 Dec 2019 03:57:38 -0800 (PST)
Subject: RFR(M): 8216041: [Event Request] - Deoptimization
Message-ID: <f0350a2e-df61-4f91-80a3-8131ac19d899@default>

Greetings,

Please review the following changeset to introduce a Deoptimization (uncommon trap)  event to JFR.

Igor Ignatyev has done related work under "JDK-8225554: add JFR event for uncommon trap" [1], and we have decided to merge our work, so this RFR will supersede the previous discussion related to JDK-8225554 [2].

Enhancement: https://bugs.openjdk.java.net/browse/JDK-8216041 
Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01
Example visualization (raw JMC): http://cr.openjdk.java.net/~mgronlun/8216041/Deoptimization.jpg  
Summary: A description of which compiler compiled the method has been added, where descriptions are keyed by values from the CompilerType enumeration. This information has also been added to the Compilation event. The current suggestion is to have stack trace information turned off by default (default.jfc), but turned on when using profile.jfc.

[1] https://bugs.openjdk.java.net/browse/JDK-8225554 
[2] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-June/000631.html

Thank you
Markus

From tobias.hartmann at oracle.com  Mon Dec  2 13:07:34 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 2 Dec 2019 14:07:34 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
Message-ID: <88c9da01-6865-4a92-67fd-2f52d6cfe84b@oracle.com>

Hi Christian,

looks reasonable to me.

Best regards,
Tobias

On 20.11.19 15:14, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8231501
> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.00/
> 
> The bug could be traced back to the concurrent cleaning of method data with its extra data in
> MethodData::clean_method_data() and the loading/copying of extra data for the ci method data in
> ciMethodData::load_extra_data(). I reproduced the bug by using the test [1] which extensively cleans
> method data by using the whitebox API [2].
> 
> Before loading and copying the extra data from the MDO to the ciMDO in
> ciMethodData::load_extra_data(), the metadata is prepared in a fixed-point iteration by cleaning all
> SpeculativeTrapData entries of methods whose klasses are unloaded [3]. If it encounters such a dead
> entry it releases the extra data lock (due to ranking issues) and tries again later [4]. This
> release of the lock triggers the bug: There can be cases where one thread A is waiting in the
> whitebox API method to get the extra data lock [2] to clean the extra data for the very same MDO for
> which another thread B just released the lock at [4]. If that MDO actually contained
> SpeculativeTrapData entries, then thread A cleaned those but the ciMDO, which thread B is preparing,
> still contains the uncleaned old MDO extra data (because thread B only made a snapshot of the MDO
> earlier at [5]). Things then go wrong when thread B can reacquire the lock after thread A. It tries
> to load the now cleaned extra data and immediately finishes at [6] since there are no
> SpeculativeTrapData entries anymore. It copied a single entry with tag DataLayout::no_tag [7] to the
> ciMDO which actually contained a SpeculativeTrapData entry. This results in a half way cleared entry
> (since a SpeculativeTrapData entry has an additional cell for the method) and possible other
> remaining SpeculativeTrapData entries:
> 
> 
> Let's assume a little-endian ordering and that both 0x00007fff... addresses are real pointers to
> methods. Tag 13 (0x0d) is used for SpeculativeTrapData and dp points to the first extra data entry:
> 
> ciMDO extra data before thread B releases the lock at [4] (same extra data for MDO and ciMDO):
> 0x800000040011000d 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000
> dp: tag = 13 -> next entry = dp+16; dp+8: method 0x00007fffd4993c63
> dp+16: tag = 13 -> next entry = dp+32; dp+24: method 0x00007fffd49b1a68
> dp+32: tag = 0 -> end of extra data
> 
> MDO extra data after thread B reacquires the lock and thread A cleaned the MDO (ciMDO extra data is
> unchanged):
> 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
> dp: tag = 0 -> end of extra data
> 
> 
> Returning at [6] when the extra data loading from MDO to ciMDO is finished:
> MDO extra data:
> 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
> dp: tag = 0 -> end of extra data
> 
> ciMDO extra data, only copied the first no_tag entry from MDO at [7] (8 bytes):
> 0x0000000000000000 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000
> dp: tag = 0 -> next entry = dp+8
> dp+8: tag = 0x63 = 99 -> there is no tag 99 -> fatal...
> 
> 
> The next time the ciMDO extra data is iterated, for example by using MethodData::next_extra(), it
> reads tag 99 after processing the first no_tag entry and jumping to the value at offset 8 which
> causes a crash since there is no tag 99 available.
> 
> 
> The fix is to completely zero out the current and all following SpeculativeTrapData entries if we
> encounter a no_tag in the MDO but a speculative_trap_data_tag tag in the ciMDO. There are also other
> cases where the method data is cleaned. Thus the bug is not only related to the whitebox API usage
> but occurs very rarely.
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1]
> http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java
> 
> [2] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/prims/whitebox.cpp#l1137
> [3] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l137
> [4] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l115
> [5] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l219
> [6] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l191
> [7] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l176

From tobias.hartmann at oracle.com  Mon Dec  2 13:38:42 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 2 Dec 2019 14:38:42 +0100
Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for
 T_ILLEGAL type
In-Reply-To: <a356313c-a7e7-fd36-5a3f-cde4bf799a8c@oracle.com>
References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com>
 <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com>
 <a356313c-a7e7-fd36-5a3f-cde4bf799a8c@oracle.com>
Message-ID: <f16b73ec-a713-12c7-93be-319205100521@oracle.com>

Hi Vladimir,

On 29.11.19 16:28, Vladimir Ivanov wrote:
> ? http://cr.openjdk.java.net/~vlivanov/8231430/webrev.01/

This looks good to me.

Best regards,
Tobias

From vladimir.x.ivanov at oracle.com  Mon Dec  2 14:56:44 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 2 Dec 2019 17:56:44 +0300
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <7776c872-e5a1-2dc4-4cf7-b1c733b6a314@redhat.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
 <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com>
 <f327a08a-b74d-32b9-2caf-e1ad9145361f@oracle.com>
 <0330b8f0-28c7-1372-2d18-2b5d0ced05c3@oracle.com>
 <7776c872-e5a1-2dc4-4cf7-b1c733b6a314@redhat.com>
Message-ID: <07e1d2ae-a96a-a551-7db6-85d260223e64@oracle.com>


> Please carry on with your changes then.

Just in case: if the bug severely affects Shenandoah and the fix 
requires time, as the stop-the-gap solution I can keep barriers around 
off-heap accesses when Shenandoah is used until proper fix arrives.

Best regards,
Vladimir Ivanov

> 
> Thanks,
> Roman
> 
>> Roman,
>>
>> JDK-8220714 looks like a bug in Shenandoah barrier expansion.
>>
>> I slightly modified the test to simplify the analysis [1].
>>
>> While running the test [2] I'm seeing the following:
>>
>>  ? 127 LoadI? === 44?? 7 125
>>  ? 160 StoreI === 44?? 7? 91 127
>>  ?? 94 LoadI? === 44?? 7? 91
>>  ? 193 StoreI === 44 160 125? 94
>>
>> After expansion is over, it looks as follows:
>>
>>  ? 127 LoadI? ===? 44 209 125
>>  ? 160 StoreI ===? 44 209? 91 127
>>  ?? 94 LoadI? ===? 44 160? 91
>>  ? 193 StoreI ===? 44 160 125? 94
>>
>> Note that 94 LoadI depends on 160 StoreI memory now. Before the
>> expansion they were independent (7 Parm == initial memory state).
>>
>> And then 94 goes away, since it now reads updated value:
>>
>> <???????? < 94??? LoadI??? === _ _ _? [[]]?? [3200094]
>>> int???????? 127??? LoadI??? ===? 44? 209? 125? [[ 160? 193 ]]
>> @rawptr:BotPTR, idx=Raw; unsafe #int (does not depend only on test)
>> !orig=[94] !jvms: Unsafe::getInt @ bci:3
>> TestUnsafeOffheapSwap$Memory::getInt @ bci:14
>> TestUnsafeOffheapSwap$Memory::swap @ bci:10
>> TestUnsafeOffheapSwap::testUnsafeHelper @ bci:7
>>
>> The final graph is evidently wrong:
>>
>>  ? 127 LoadI? ===? 44 209 125
>>  ? 160 StoreI ===? 44 209? 91 127
>>  ? 193 StoreI ===? 44 160 125 127
>>
>> Let me know how you want to proceed with it.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1]
>>
>> diff --git
>> a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
>> b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
>> --- a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
>> +++ b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
>> @@ -57,6 +57,16 @@
>>  ???????? }
>>  ???? }
>>
>> +??? static void testUnsafeHelper(int i) {
>> +??????? mem.swap(i - 1, i);
>> +??? }
>> +
>> +??? static void testArrayHelper(int[] arr, int i) {
>> +??????? int tmp = arr[i - 1];
>> +??????? arr[i - 1] = arr[i];
>> +??????? arr[i] = tmp;
>> +??? }
>> +
>>  ???? static void test() {
>>  ???????? Random rnd = new Random(SEED);
>>  ???????? for (int i = 0; i < SIZE; i++) {
>> @@ -72,10 +82,8 @@
>>  ???????? }
>>
>>  ???????? for (int i = 1; i < SIZE; i++) {
>> -??????????? mem.swap(i - 1, i);
>> -??????????? int tmp = arr[i - 1];
>> -??????????? arr[i - 1] = arr[i];
>> -??????????? arr[i] = tmp;
>> +??????????? testUnsafeHelper(i);
>> +??????????? testArrayHelper(arr, i);
>>  ???????? }
>>
>>  ???????? for (int i = 0; i < SIZE; i++) {
>>
>> [2] $ java -cp
>> JTwork/classes/gc/shenandoah/compiler/TestUnsafeOffheapSwap.d/
>> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
>> -XX:-UseOnStackReplacement -XX:-BackgroundCompilation
>> -XX:-TieredCompilation? -XX:+UnlockExperimentalVMOptions
>> -XX:+UseShenandoahGC -XX:+PrintCompilation -XX:CICompilerCount=1
>> -XX:CompileCommand=quiet
>> -XX:CompileCommand=compileonly,*::testUnsafeHelper
>> -XX:CompileCommand=print,*::testUnsafeHelper -XX:PrintIdealGraphLevel=0
>> -XX:-VerifyOops -XX:-UseCompressedOops TestUnsafeOffheapSwap
>>
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 29.11.2019 20:18, Vladimir Ivanov wrote:
>>>
>>>> Does it affect this:
>>>> https://bugs.openjdk.java.net/browse/JDK-8220714
>>>
>>> Good point, Roman. Proposed patch breaks the fix for JDK-8220714.
>>>
>>> I'll investigate what happens there and come back with a revised fix.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8226411
>>>>>
>>>>> There were a number of fixes in C2 support for unsafe accesses recently
>>>>> which led to additional memory barriers around them. It improved
>>>>> stability, but in some cases it was redundant. One of important use
>>>>> cases which regressed is off-heap accesses [1]. The barriers around
>>>>> them
>>>>> are redundant because they are serialized on raw memory and don't
>>>>> intersect with any on-heap accesses.
>>>>>
>>>>> Proposed fix skips memory barriers around unsafe accesses which are
>>>>> provably off-heap (base == NULL).
>>>>>
>>>>> It (almost completely) recovers performance on the microbenchmark
>>>>> provided in JDK-8224182 [1].
>>>>>
>>>>> Testing: tier1-6.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182
>>>>>
>>>>
>>
> 

From vladimir.x.ivanov at oracle.com  Mon Dec  2 14:57:00 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 2 Dec 2019 17:57:00 +0300
Subject: [14] RFR (S): 8234923: Missed call_site_target nmethod dependency
 for non-fully initialized ConstantCallSite instance
In-Reply-To: <F095375D-EF1B-4CBB-BA8D-3F8E7453A6DD@oracle.com>
References: <7d4c2ab1-f8ec-8ccc-a442-8401a048b353@oracle.com>
 <F095375D-EF1B-4CBB-BA8D-3F8E7453A6DD@oracle.com>
Message-ID: <0e177195-9e3e-661e-5dd0-e613f02747c1@oracle.com>

Thank you, John.

Best regards,
Vladimir Ivanov

On 30.11.2019 02:30, John Rose wrote:
> Reviewed.
> 
>> On Nov 29, 2019, at 7:55 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/8234923/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234923
> 

From erik.gahlin at oracle.com  Mon Dec  2 15:00:05 2019
From: erik.gahlin at oracle.com (Erik Gahlin)
Date: Mon, 2 Dec 2019 16:00:05 +0100
Subject: RFR(M): 8216041: [Event Request] - Deoptimization
In-Reply-To: <f0350a2e-df61-4f91-80a3-8131ac19d899@default>
References: <f0350a2e-df61-4f91-80a3-8131ac19d899@default>
Message-ID: <42a6af4e-d816-ebf1-9519-b1de6f5606ea@oracle.com>

Hi Markus,

Looks good, but could you change the names in metadata.xml to 
DeoptimizationReason and DeoptimizationAction? No need to send out new 
webrev.

Thanks
Erik

On 2019-12-02 12:57, Markus Gronlund wrote:
> Greetings,
>
> Please review the following changeset to introduce a Deoptimization (uncommon trap)  event to JFR.
>
> Igor Ignatyev has done related work under "JDK-8225554: add JFR event for uncommon trap" [1], and we have decided to merge our work, so this RFR will supersede the previous discussion related to JDK-8225554 [2].
>
> Enhancement: https://bugs.openjdk.java.net/browse/JDK-8216041
> Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01
> Example visualization (raw JMC): http://cr.openjdk.java.net/~mgronlun/8216041/Deoptimization.jpg
> Summary: A description of which compiler compiled the method has been added, where descriptions are keyed by values from the CompilerType enumeration. This information has also been added to the Compilation event. The current suggestion is to have stack trace information turned off by default (default.jfc), but turned on when using profile.jfc.
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8225554
> [2] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-June/000631.html
>
> Thank you
> Markus

From aph at redhat.com  Mon Dec  2 15:34:33 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 2 Dec 2019 10:34:33 -0500
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
 <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
 <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com>
Message-ID: <7851d79d-5cff-05bc-9c52-b1f1bf3c175e@redhat.com>

On 12/2/19 7:55 AM, Nick Gasson wrote:
> On 29/11/2019 18:10, Andrew Haley wrote:
>>> How about we exit with a fatal error if we can't find a suitably aligned
>>> region? Then we can remove the code in decode_klass_non_null that uses
>>> R27 and this patch is much simpler. That code path is poorly tested at
>>> the moment so it seems risky to leave it in. With a hard error at least
>>> users will report it to us so we can fix it.
>>
>> That is starting to sound very attractive. With a 64-bit address space I'm
>> finding it very hard to imagine a scenario in which we don't find a
>> suitable address. I think AOT-compiled code would still be OK, because it
>> generates different code, but we'd have to do some testing.
>>
> 
> There's another little wrinkle: even after updating the shared metaspace 
> to use the search algorithm to find a 4G-aligned location, we can still 
> end up with something like:
> 
> CompressedKlassPointers::base() => 0x1100000000
> CompressedKlassPointers::shift() => 3
> 
> (The shift is always set to LogKlassAlignmentInBytes when CDS is on.)

I remember that. I don't know why; it's not much use to us on AArch64.

> Here we can't use EOR because 0x1100000000 doesn't fit in the immediate, 
> and we can't use MOVK because the shift is non-zero.
> 
> I think the solution is to adjust the algorithm to search in increments 
> of (4 << LogKlassAlignmentInBytes)*G once we hit 32G (the point at which 
> we can no longer use a zero base). Then we can decode like this:
> 
>      const uint64_t shifted_base =
>        (uint64_t)CompressedKlassPointers::base() >> 
> CompressedKlassPointers::shift();
>      guarantee((shifted_base & 0xffffffff) == 0, "compressed class base 
> bad alignment");
> 
>      if (dst != src) movw(dst, src);
>      movk(dst, shifted_base >> 32, 32);
> 
>      if (CompressedKlassPointers::shift() != 0) {
>        lsl(dst, dst, LogKlassAlignmentInBytes);
>      }
> 
> What do you think?

Could work, but can you also try disabling the shift with CDS? It doesn't do
us much good and it bloats code.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From jianglizhou at google.com  Mon Dec  2 16:05:37 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Mon, 2 Dec 2019 08:05:37 -0800
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <7851d79d-5cff-05bc-9c52-b1f1bf3c175e@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
 <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
 <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com>
 <7851d79d-5cff-05bc-9c52-b1f1bf3c175e@redhat.com>
Message-ID: <CALrW1jw=d5ve=mShPn2pK9Ju10=nt8oH97jpW-mi-P=OckTs2A@mail.gmail.com>

On Mon, Dec 2, 2019 at 7:34 AM Andrew Haley <aph at redhat.com> wrote:
>
> On 12/2/19 7:55 AM, Nick Gasson wrote:
> > On 29/11/2019 18:10, Andrew Haley wrote:
> >>> How about we exit with a fatal error if we can't find a suitably aligned
> >>> region? Then we can remove the code in decode_klass_non_null that uses
> >>> R27 and this patch is much simpler. That code path is poorly tested at
> >>> the moment so it seems risky to leave it in. With a hard error at least
> >>> users will report it to us so we can fix it.
> >>
> >> That is starting to sound very attractive. With a 64-bit address space I'm
> >> finding it very hard to imagine a scenario in which we don't find a
> >> suitable address. I think AOT-compiled code would still be OK, because it
> >> generates different code, but we'd have to do some testing.
> >>
> >
> > There's another little wrinkle: even after updating the shared metaspace
> > to use the search algorithm to find a 4G-aligned location, we can still
> > end up with something like:
> >
> > CompressedKlassPointers::base() => 0x1100000000
> > CompressedKlassPointers::shift() => 3
> >
> > (The shift is always set to LogKlassAlignmentInBytes when CDS is on.)
>
> I remember that. I don't know why; it's not much use to us on AArch64.

When CDS is enabled, the compressed klass encoding shift is set to
LogKlassAlignmentInBytes with the intend for AOT compatibility. AOT
requires LogKlassAlignmentInBytes (3) as the shift.

Best,
Jiangli
>
> > Here we can't use EOR because 0x1100000000 doesn't fit in the immediate,
> > and we can't use MOVK because the shift is non-zero.
> >
> > I think the solution is to adjust the algorithm to search in increments
> > of (4 << LogKlassAlignmentInBytes)*G once we hit 32G (the point at which
> > we can no longer use a zero base). Then we can decode like this:
> >
> >      const uint64_t shifted_base =
> >        (uint64_t)CompressedKlassPointers::base() >>
> > CompressedKlassPointers::shift();
> >      guarantee((shifted_base & 0xffffffff) == 0, "compressed class base
> > bad alignment");
> >
> >      if (dst != src) movw(dst, src);
> >      movk(dst, shifted_base >> 32, 32);
> >
> >      if (CompressedKlassPointers::shift() != 0) {
> >        lsl(dst, dst, LogKlassAlignmentInBytes);
> >      }
> >
> > What do you think?
>
> Could work, but can you also try disabling the shift with CDS? It doesn't do
> us much good and it bloats code.
>
> --
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>

From vladimir.x.ivanov at oracle.com  Mon Dec  2 16:44:19 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 2 Dec 2019 19:44:19 +0300
Subject: RFR(M): 8216041: [Event Request] - Deoptimization
In-Reply-To: <f0350a2e-df61-4f91-80a3-8131ac19d899@default>
References: <f0350a2e-df61-4f91-80a3-8131ac19d899@default>
Message-ID: <181c7d8a-3c6f-307d-5aa0-257586a190c7@oracle.com>


> Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01

Compiler changes look good.

Best regards,
Vladimir Ivanov

From vladimir.kozlov at oracle.com  Mon Dec  2 20:09:39 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 2 Dec 2019 12:09:39 -0800
Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for
 T_ILLEGAL type
In-Reply-To: <f16b73ec-a713-12c7-93be-319205100521@oracle.com>
References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com>
 <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com>
 <a356313c-a7e7-fd36-5a3f-cde4bf799a8c@oracle.com>
 <f16b73ec-a713-12c7-93be-319205100521@oracle.com>
Message-ID: <EE806043-E01B-4F43-9E4D-4BAE51C4F6DC@oracle.com>

+1

Thanks
Vladimir

> On Dec 2, 2019, at 5:38 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Vladimir,
> 
>> On 29.11.19 16:28, Vladimir Ivanov wrote:
>>   http://cr.openjdk.java.net/~vlivanov/8231430/webrev.01/
> 
> This looks good to me.
> 
> Best regards,
> Tobias


From igor.ignatyev at oracle.com  Mon Dec  2 21:19:00 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 2 Dec 2019 13:19:00 -0800
Subject: RFR(M): 8216041: [Event Request] - Deoptimization
In-Reply-To: <f0350a2e-df61-4f91-80a3-8131ac19d899@default>
References: <f0350a2e-df61-4f91-80a3-8131ac19d899@default>
Message-ID: <4B4C80C8-CF8A-4AA2-ACF8-7C67D7360A86@oracle.com>

Hi Markus,

looks good to me.

-- Igor

> On Dec 2, 2019, at 3:57 AM, Markus Gronlund <markus.gronlund at oracle.com> wrote:
> 
> Greetings,
> 
> Please review the following changeset to introduce a Deoptimization (uncommon trap)  event to JFR.
> 
> Igor Ignatyev has done related work under "JDK-8225554: add JFR event for uncommon trap" [1], and we have decided to merge our work, so this RFR will supersede the previous discussion related to JDK-8225554 [2].
> 
> Enhancement: https://bugs.openjdk.java.net/browse/JDK-8216041 
> Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01
> Example visualization (raw JMC): http://cr.openjdk.java.net/~mgronlun/8216041/Deoptimization.jpg  
> Summary: A description of which compiler compiled the method has been added, where descriptions are keyed by values from the CompilerType enumeration. This information has also been added to the Compilation event. The current suggestion is to have stack trace information turned off by default (default.jfc), but turned on when using profile.jfc.
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8225554 
> [2] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-June/000631.html
> 
> Thank you
> Markus


From john.r.rose at oracle.com  Tue Dec  3 05:04:50 2019
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 2 Dec 2019 21:04:50 -0800
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <2BAEB1D3-46B7-4BA9-81A3-4F5E7B47B82A@gmail.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
 <877e3x0wji.fsf@oldenburg2.str.redhat.com>
 <2BAEB1D3-46B7-4BA9-81A3-4F5E7B47B82A@gmail.com>
Message-ID: <1590A038-CA8D-4174-8FC3-F3BAC1A93E57@oracle.com>

On Nov 18, 2019, at 8:43 PM, August Nagro <augustnagro at gmail.com> wrote:
> 
> And here?s a tangent to thing to think about: is growing HashMap?s backing array by powers of 2 actually a good thing, when the HashMap gets large? What if you instead wanted to grow by powers of 1.5, or even grow probabilistically, based on the collision rate, allocation pressure, or other data? With fast-range you can do this if you want. And without the performance hit of %!

It's an interesting tangent; let me follow you along it for a bit.

I think there may be interesting compromises between a size menu having only
powers of two and a size-to-order policy, or a menu of powers of bases not
related to 2, such as 1.5.  In particular, if you can manage a ratio near to 1.414
you can cut your fragmentation costs roughly in half by putting all powers of
2 on the menu, plus those values times 1.414.  (Or likewise a pair of multipliers
near to 1.260 and 1.587.)  If there were a clever ALU operation (or table lookup)
cheaper than a full multiply or remainder that would get the effect of a
fast range reduction just for the items on the menu, then it would be a
good trade-off to stick to the menu, where the discounts apply, rather
than pay for full-custom sizes.  Both 1.4 (7/5) and 1.5 (3/2) look attractively
simple, requiring some wrangling of approximate moduli of 3 or 5 or 7.
Or for 2^(1/3), 1.25 (5/4) and 1.66? (5/3)) are also simple ratios.  Or maybe
there is a non-obvious number (not a simple ratio) which has a surprisingly
simple modulus approximation?though I?m not holding my breath on that
one. Since I?m hoping to stay under the cost of a multiply, I admit it?s a
tight ceiling, but I think there might be something there between the
two extremes (2^n only and arbitrary sizes).

? John

From john.r.rose at oracle.com  Tue Dec  3 06:06:39 2019
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 2 Dec 2019 22:06:39 -0800
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <877e3x0wji.fsf@oldenburg2.str.redhat.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
 <877e3x0wji.fsf@oldenburg2.str.redhat.com>
Message-ID: <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com>

On Nov 18, 2019, at 12:17 PM, Florian Weimer <fweimer at redhat.com> wrote:
> 
>> int bucket = fr(h * M); // M = 0x2357BD or something
>> 
>> or maybe something fast and sloppy like:
>> 
>> int bucket = fr(h + (h << 8));

Surely this one works, since fr is the final operation.
The shift/add is just a mixing step to precondition the input.

Just for the record, I?d like to keep brainstorming a bit more,
though surely this sort of thing is of limited interest.  So,
just a little more from me on this.

If we had BITR I?d want to try something like fr(h - bitr(h)).

But what I keep wishing for a good one- or two-cycle instruction
that will mix all input bits into all the output bits, so that any
change in one input bit is likely to cause cascading changes in
many output bits, perhaps even 50% on average.  A pair of AES
steps is a good example of this.  I think AES would be superior to
multiply (for mixing) when used on 128 bit payloads or larger, so
it looks appealing (to me) for vectorizable hashing applications.
Though it is overkill on scalars, I think it points in promising
directions for scalars also.

>> 
>> or even:
>> 
>> int bucket = fr(h) ^ (h & (N-1));
> Does this really work?  I don't think so.

Oops, you are right.  Something like it might work, though.

The idea, on paper, is that h & (N-1) is less than N, for any N >=1.
And if N-1 has a high enough pop-count the information content is close to
50% of h (though maybe 50% of the bits are masked out).  The xor of two
quasi-independent values both less than N is, well, less than 2^(ceil lg N),
not N, which is a bug.  Oops.  There are ways to quickly combine two values
less than N and reduce the result to less than N:  You do a conditional
subtract of N if the sum is >= N.

So the tactical area I?m trying to explore here is to take two reduced
hashes developed in parallel, which depend on different bits of the
input, and combine them into a single stronger hash (not by ^).

Maybe (I can?t resist hacking at it some more):

int h1 = H1(h), h2 = H2(h);
int bucket = CCA(h1 - h2, N);
// where H1 := fr, H2(h) := (h & (N-1))
// where CCA(x, N) := x + ((x >> 31) & N) // Conditional Compensating Add

In this framework, a better H2 for favoring the low bits of h might be
H2(h) := ((1<<L)-1) & h, where L = (ceil lg N)-1.  This arguably maximizes
the number of low bits of h that feed into the final bucket selection,
while fr (H1) arguably maximizes the number of influential high bits.

> I think this kind of perturbation is quite expensive.  Arm's BITR should
> be helpful here.

Yes, BITR is a helpful building block.  If I understand you correctly, it
needs to be combined with other operations, such as multiply, shift, xor, etc.,
and can overcome biases towards high bits or towards low bits that come
from the simple arithmetic definitions of the other mixing operations.

The hack with CCA(h1 - h2, N) seems competitive with a BITR-based
mixing step, since H2 can be very simple.

A scalar variant of two AES steps (with xor of a second register or constant
parameter at both stages) would be a better building block for strongly mixing bits.
Or some other shallow linear network with a layer of non-linear S-boxes.

> But even though this operation is commonly needed and
> easily implemented in hardware, it's rarely found in CPUs.

Yep; the whole cottage industry of building clever mixing functions
out of hand calculator functions could be sidelined if CPUs gave us good
cheap mixing primitives out of the box.  The crypto literature is full of
them, and many are designed to be easy to implement in silicon.

? John

P.S. I mention AES because I?m familiar with that bit of crypto tech, and
also because I actually tried it out once on the smhasher quality benchmark.
No surprise in hindsight; it passes the quality tests with just two rounds.
Given that it is as cheap as multiplication, and handles twice as many
bits at a time, but requires two steps for full mixing, it would seem to
be competitive with multiplication as a mixing step.  It has no built-in
biases towards high or low bits, so that?s an advantage over multiplication.

Why two rounds?  The one-round version has flaws, as a hash function,
which are obvious on inspection of the simple structure of an AES round.
Not every output bit is data-dependent on every input bit of one round,
but two rounds swirls them all together.  Are back-to-back AES rounds
expensive?  Maybe, although that?s how the instructions are designed to
be used, about 10 of them back to back to do real crypto.


From christian.hagedorn at oracle.com  Tue Dec  3 07:12:42 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 3 Dec 2019 08:12:42 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <88c9da01-6865-4a92-67fd-2f52d6cfe84b@oracle.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
 <88c9da01-6865-4a92-67fd-2f52d6cfe84b@oracle.com>
Message-ID: <79476732-3ddb-1002-d4b1-f845cf76ee86@oracle.com>

Thank you Tobias for your review!

Best regards,
Christian

On 02.12.19 14:07, Tobias Hartmann wrote:
> Hi Christian,
> 
> looks reasonable to me.
> 
> Best regards,
> Tobias
> 
> On 20.11.19 15:14, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8231501
>> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.00/
>>
>> The bug could be traced back to the concurrent cleaning of method data with its extra data in
>> MethodData::clean_method_data() and the loading/copying of extra data for the ci method data in
>> ciMethodData::load_extra_data(). I reproduced the bug by using the test [1] which extensively cleans
>> method data by using the whitebox API [2].
>>
>> Before loading and copying the extra data from the MDO to the ciMDO in
>> ciMethodData::load_extra_data(), the metadata is prepared in a fixed-point iteration by cleaning all
>> SpeculativeTrapData entries of methods whose klasses are unloaded [3]. If it encounters such a dead
>> entry it releases the extra data lock (due to ranking issues) and tries again later [4]. This
>> release of the lock triggers the bug: There can be cases where one thread A is waiting in the
>> whitebox API method to get the extra data lock [2] to clean the extra data for the very same MDO for
>> which another thread B just released the lock at [4]. If that MDO actually contained
>> SpeculativeTrapData entries, then thread A cleaned those but the ciMDO, which thread B is preparing,
>> still contains the uncleaned old MDO extra data (because thread B only made a snapshot of the MDO
>> earlier at [5]). Things then go wrong when thread B can reacquire the lock after thread A. It tries
>> to load the now cleaned extra data and immediately finishes at [6] since there are no
>> SpeculativeTrapData entries anymore. It copied a single entry with tag DataLayout::no_tag [7] to the
>> ciMDO which actually contained a SpeculativeTrapData entry. This results in a half way cleared entry
>> (since a SpeculativeTrapData entry has an additional cell for the method) and possible other
>> remaining SpeculativeTrapData entries:
>>
>>
>> Let's assume a little-endian ordering and that both 0x00007fff... addresses are real pointers to
>> methods. Tag 13 (0x0d) is used for SpeculativeTrapData and dp points to the first extra data entry:
>>
>> ciMDO extra data before thread B releases the lock at [4] (same extra data for MDO and ciMDO):
>> 0x800000040011000d 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000
>> dp: tag = 13 -> next entry = dp+16; dp+8: method 0x00007fffd4993c63
>> dp+16: tag = 13 -> next entry = dp+32; dp+24: method 0x00007fffd49b1a68
>> dp+32: tag = 0 -> end of extra data
>>
>> MDO extra data after thread B reacquires the lock and thread A cleaned the MDO (ciMDO extra data is
>> unchanged):
>> 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
>> dp: tag = 0 -> end of extra data
>>
>>
>> Returning at [6] when the extra data loading from MDO to ciMDO is finished:
>> MDO extra data:
>> 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
>> dp: tag = 0 -> end of extra data
>>
>> ciMDO extra data, only copied the first no_tag entry from MDO at [7] (8 bytes):
>> 0x0000000000000000 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000
>> dp: tag = 0 -> next entry = dp+8
>> dp+8: tag = 0x63 = 99 -> there is no tag 99 -> fatal...
>>
>>
>> The next time the ciMDO extra data is iterated, for example by using MethodData::next_extra(), it
>> reads tag 99 after processing the first no_tag entry and jumping to the value at offset 8 which
>> causes a crash since there is no tag 99 available.
>>
>>
>> The fix is to completely zero out the current and all following SpeculativeTrapData entries if we
>> encounter a no_tag in the MDO but a speculative_trap_data_tag tag in the ciMDO. There are also other
>> cases where the method data is cleaned. Thus the bug is not only related to the whitebox API usage
>> but occurs very rarely.
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1]
>> http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java
>>
>> [2] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/prims/whitebox.cpp#l1137
>> [3] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l137
>> [4] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l115
>> [5] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l219
>> [6] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l191
>> [7] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l176

From nick.gasson at arm.com  Tue Dec  3 07:43:19 2019
From: nick.gasson at arm.com (Nick Gasson)
Date: Tue, 3 Dec 2019 15:43:19 +0800
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <CALrW1jw=d5ve=mShPn2pK9Ju10=nt8oH97jpW-mi-P=OckTs2A@mail.gmail.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
 <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
 <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com>
 <7851d79d-5cff-05bc-9c52-b1f1bf3c175e@redhat.com>
 <CALrW1jw=d5ve=mShPn2pK9Ju10=nt8oH97jpW-mi-P=OckTs2A@mail.gmail.com>
Message-ID: <f986d86c-fde2-288e-4e7e-a6ccf5f8a121@arm.com>

>> >
>> > CompressedKlassPointers::base() => 0x1100000000
>> > CompressedKlassPointers::shift() => 3
>> >
>> > (The shift is always set to LogKlassAlignmentInBytes when CDS is on.)
>>
>> I remember that. I don't know why; it's not much use to us on AArch64.
>
> When CDS is enabled, the compressed klass encoding shift is set to
> LogKlassAlignmentInBytes with the intend for AOT compatibility. AOT
> requires LogKlassAlignmentInBytes (3) as the shift.
>

Yes, in AOTGraalHotSpotVMConfig.java we have:

  // AOT captures VM settings during compilation. For compressed oops this
  // presents a problem for the case when the VM selects a zero-shift mode
  // (i.e., when the heap is less than 4G). Compiling an AOT binary with
  // zero-shift limits its usability. As such we force the shift to be
  // always equal to alignment to avoid emitting zero-shift AOT code.
  CompressEncoding vmOopEncoding = super.getOopEncoding();
  aotOopEncoding = new CompressEncoding(vmOopEncoding.getBase(), logMinObjAlignment());
  CompressEncoding vmKlassEncoding = super.getKlassEncoding();
  aotKlassEncoding = new CompressEncoding(vmKlassEncoding.getBase(), logKlassAlignment);

I get why AOT needs to set the shift for compressed OOPs, but for
compressed class pointers the maximum class space size is 3G so the
shift is only useful to allow a zero base. For AOT+CDS the default class
space load address is 0x800000000 (32G) which is above the limit where a
zero base is possible. So I think it would be better if AOT and CDS used
a zero shift by default, as on AArch64 and X86 at least, we generate
less efficient code when both the shift and base are non-zero. Using
logKlassAlignment above works well for non-CDS, as it allows the class
space to mapped at a low address with zero base.

Maybe aotKlassEncoding could be set from the current VM settings
instead?


Thanks,
Nick

From vladimir.x.ivanov at oracle.com  Tue Dec  3 08:09:16 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 3 Dec 2019 11:09:16 +0300
Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for
 T_ILLEGAL type
In-Reply-To: <EE806043-E01B-4F43-9E4D-4BAE51C4F6DC@oracle.com>
References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com>
 <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com>
 <a356313c-a7e7-fd36-5a3f-cde4bf799a8c@oracle.com>
 <f16b73ec-a713-12c7-93be-319205100521@oracle.com>
 <EE806043-E01B-4F43-9E4D-4BAE51C4F6DC@oracle.com>
Message-ID: <ce43ab3f-88d8-e529-211f-918cf20f5808@oracle.com>

Thanks for reviews, Vladimir and Tobias.

Best regards,
Vladimir Ivanov

On 02.12.2019 23:09, Vladimir Kozlov wrote:
> +1
> 
> Thanks
> Vladimir
> 
>> On Dec 2, 2019, at 5:38 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
>>
>> Hi Vladimir,
>>
>>> On 29.11.19 16:28, Vladimir Ivanov wrote:
>>>    http://cr.openjdk.java.net/~vlivanov/8231430/webrev.01/
>>
>> This looks good to me.
>>
>> Best regards,
>> Tobias
> 

From tobias.hartmann at oracle.com  Tue Dec  3 08:18:22 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 3 Dec 2019 09:18:22 +0100
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com>
Message-ID: <ea8c2bc4-c7df-f48a-6f0d-73a627c1a7ca@oracle.com>

Hi,

this looks good to me.

Best regards,
Tobias

On 30.11.19 02:02, Ioi Lam wrote:
> Hi Pengfei,
> 
> I have cc-ed hotspot-compiler-dev at openjdk.java.net.
> 
> Please do not push the patch until someone from hotspot-compiler-dev has looked at it.
> 
> Many people are away due to Thanksgiving in the US.
> 
> Thanks
> - Ioi
> 
> On 11/28/19 7:56 PM, Pengfei Li (Arm Technology China) wrote:
>> Hi,
>>
>> Please help review this small fix for 64-bit client build.
>>
>> Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234791
>>
>> Current 64-bit client VM build fails because errors occurred in dumping
>> the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which
>> runs "java -Xshare:dump" after linking the JDK image. But for Client VM
>> build on 64-bit platforms, the ergonomic flag UseCompressedOops is not
>> set.[2] This leads to VM exits in checking the flags for dumping the
>> shared archive.[3]
>>
>> This change removes the "#if defined" macro to make shared archive dump
>> successful in 64-bit client build. By tracking the history of the macro,
>> I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago
>> when C1 did not have a good support of compressed oops and modified to
>> current shape[5] in the implementation of tiered compilation. It should
>> be safe to be removed today.
>>
>> This patch also fixes another client build issue on AArch64.
>>
>> [1] http://openjdk.java.net/jeps/341
>> [2]
>> http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694
>> [3]
>> http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551
>> [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7
>> [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56
>>
>> -- 
>> Thanks,
>> Pengfei
>>
> 

From vladimir.x.ivanov at oracle.com  Tue Dec  3 09:32:48 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 3 Dec 2019 12:32:48 +0300
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
 <877e3x0wji.fsf@oldenburg2.str.redhat.com>
 <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com>
Message-ID: <5ec39d06-8f9c-f731-bd8f-7eeef85147f2@oracle.com>


> but two rounds swirls them all together.  Are back-to-back AES rounds
> expensive?  Maybe, although that?s how the instructions are designed to
> be used, about 10 of them back to back to do real crypto.

Throughput-oriented implementation should work fine for crypto purposes, 
but AESENC does look very good on recent Intel micro-architectures from 
latency perspective as well (data from [1] [2]): it improved from 8/1 on 
Sandy Bridge and 7/1 on Haswell to 4/1 on Skylake and it's listed (on 
uops.info [2]) as 3/1 on Ice Lake which is on par with IMUL (while 
processing twice as much bits).

And vector variant (VAESENC) has the same latency as scalar (8->7->4->3 
[3]) which looks very appealing for throughput-oriented use cases.

Best regards,
Vladimir Ivanov

[1] https://www.agner.org/optimize/instruction_tables.pdf

[2] https://uops.info/html-lat/ICL/AESENC_XMM_XMM-Measurements.html

[3] https://uops.info/html-instr/VAESENC_XMM_XMM_XMM.html

From markus.gronlund at oracle.com  Tue Dec  3 10:02:25 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Tue, 3 Dec 2019 02:02:25 -0800 (PST)
Subject: RFR(M): 8216041: [Event Request] - Deoptimization
In-Reply-To: <4B4C80C8-CF8A-4AA2-ACF8-7C67D7360A86@oracle.com>
References: <f0350a2e-df61-4f91-80a3-8131ac19d899@default>
 <4B4C80C8-CF8A-4AA2-ACF8-7C67D7360A86@oracle.com>
Message-ID: <bec11a93-e842-475a-b5b9-dc75130484ec@default>

Igor, Vladimir and Erik,

Thank you for your reviews!

Markus

-----Original Message-----
From: Igor Ignatyev 
Sent: den 2 december 2019 22:19
To: Markus Gronlund <markus.gronlund at oracle.com>
Cc: hotspot-jfr-dev at openjdk.java.net; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(M): 8216041: [Event Request] - Deoptimization

Hi Markus,

looks good to me.

-- Igor

> On Dec 2, 2019, at 3:57 AM, Markus Gronlund <markus.gronlund at oracle.com> wrote:
> 
> Greetings,
> 
> Please review the following changeset to introduce a Deoptimization (uncommon trap)  event to JFR.
> 
> Igor Ignatyev has done related work under "JDK-8225554: add JFR event for uncommon trap" [1], and we have decided to merge our work, so this RFR will supersede the previous discussion related to JDK-8225554 [2].
> 
> Enhancement: https://bugs.openjdk.java.net/browse/JDK-8216041 
> Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01
> Example visualization (raw JMC): http://cr.openjdk.java.net/~mgronlun/8216041/Deoptimization.jpg  
> Summary: A description of which compiler compiled the method has been added, where descriptions are keyed by values from the CompilerType enumeration. This information has also been added to the Compilation event. The current suggestion is to have stack trace information turned off by default (default.jfc), but turned on when using profile.jfc.
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8225554 
> [2] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-June/000631.html
> 
> Thank you
> Markus


From xxinliu at amazon.com  Tue Dec  3 11:00:41 2019
From: xxinliu at amazon.com (Liu, Xin)
Date: Tue, 3 Dec 2019 11:00:41 +0000
Subject: RFR(XS): 8234977: [Aarch64] C1 lacks a membar store in slowpath of
 allocate_array
Message-ID: <D82FFF0B-7BDF-4472-B309-B40CEA764BA7@amazon.com>

Hi,

C1 misses a member for the slow path of alloc_array or alloc_obj on aarch64. We met this problem when we ran jcstress on AWS graviton processor, which might reorder stores.   Without a storestore member, put_field behind might commit to memory ahead of array/object initialization.
This change tries to fix that by adjusting bound location. Could reviewers help me  to review this change?

Bug: https://bugs.openjdk.java.net/browse/JDK-8234977
Webrev: https://cr.openjdk.java.net/~xliu/8234977/00/webrev/

Validation:  I passed jcstress and hotspot tier1 on arch64.

Thanks,
--lx


From vladimir.x.ivanov at oracle.com  Tue Dec  3 11:49:14 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 3 Dec 2019 14:49:14 +0300
Subject: [14] RFR (L): 8234391: C2: Generic vector operands
In-Reply-To: <375F08F3-BE51-41F0-A20C-9443CB9F0D46@oracle.com>
References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
 <375F08F3-BE51-41F0-A20C-9443CB9F0D46@oracle.com>
Message-ID: <d836df3a-07ac-9293-15c2-b1be3da120dd@oracle.com>

Thanks a lot for the review, John.

Incremental change:
   http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.01-00

(Also, extended idealreg2regmask refactoring to non-vector registers.)

> In machnode.cpp I think the code would be easier to read if a common
> subexpression were assigned a name ?num_edges? as in similar places:
> 
>     uint num_edges = _opnds[opcnt]->num_edges();
> 
> (Alternatively, you could just increment ?skipped? on the fly, which
> would be OK too.)  I?m being picky because it cost me a second to verify
> that the set of raw edges was processed exhaustively.  This is a task
> every reader of the code will have to do.

Agree, fixed.

> Another nit:  There is some dissonance between the seemingly general
> name postselect_cleanup and supports_generic_vector_operands, which
> is a very specific name.  But they refer to the same condition.  Pick the
> shorter name, maybe, and convert the longer one into a nice comment.
> It?s also not clear that helpers like process_mach_node and get_vector_operand
> are is part of postselect_cleanup.  Should they be cleanup_mach_node
> cleanup_vector_operand?  It?s a nit-pick, but might help future readers.

Agree, fixed.

> The name get_vector_operand is particularly bad, since it suggests that
> it accesses something previously computed, where in fact it transforms
> the graph.  Also, IMO, ?get_? is one of those noise words which offers little
> help to the reader and just takes up valuable screen space.

Agree, fixed.

> The name clone_generic_vector_operand is confusing; I would expect it
> to be called something like [specialize,cleanup,?]_generic_vector_operand.

Agree, fixed.

> There?s a funny condition reported in this comment:
>    // RShiftCntV/RShiftCntV report wide vector type, but VecS as ideal register.
> 
> It seems to comes out of the blue sky.  Maybe add a cross-referencing comment
> between that and the vshiftcnt instruction in the AD file?  (Are there asserts
> that would catch similar oddities if they were to arise?  Was this one caught
> via an assert?  I certainly hope it was, rather than by debugging bad code!)

It comes not from vshiftcnt, but from ideal node declaration :

class LShiftCntVNode : public VectorNode {
  public:
   LShiftCntVNode(Node* cnt, const TypeVect* vt) : VectorNode(cnt,vt) {}
   virtual int Opcode() const;
   virtual uint ideal_reg() const { return 
Matcher::vector_shift_count_ideal_reg(vect_type()->length_in_bytes()); }
};

Yes, complete migration to generic vector operands help with catching 
such cases. It leads to concrete vector type mismatches on IR level and 
those are caught with asserts.

Anyway, I added the reference to vectornode.hpp.

> On a similar note (about asserts), I?m very glad to see verify_mach_nodes.
> The name is a little non-specific.  Maybe verify_after_postselect_cleanup.

Agree, fixed.

> Why do vector[xy]_reg_legacy and vectorz_reg_legacy get different treatments
> in the change set?  I?m mainly curious here.  The vectorz_reg_vl thing is a
> dynamic set (?) which is fine, but why is it needed for z and not xy?  A comment
> might help.  Also, this gripe is not part of this review, but I?ll make it anyway:
> The very very short acronym ?vl? which appears here starts for ?AVX512VL?
> referring to ?variable length? but it bumps into the phrase ?vector legacy?
> with an unfortunate occasion for confusion.  Suggest ?_vl? be renamed to
> ?_512vl? or some other more specific thing.

I agree that _vl postfix is cryptic. I'll file an RFE to investigate 
possible cleanups. (One of the ideas was to completely eliminate dynamic 
register masks for vectors and get rid of (leg)vec[SDXYZ] along the way, 
but it was left for a future enhancement.)

> For the string intrinsics, there?s a regular replacement of legRegS by legRegD.
> That strikes me as potentially a semantic change.  I suppose the register
> allocator will do the same things in both cases, and there?s no spill code
> generated by the JIT for such a temp.  I wonder, if the change was necessary,
> how do we know that all the occurrences were correctly found and changed?
> (Also, the string intrinsic macros use AVX512 instructions when available,
> and in theory those would require heftier register masks.)  Can someone
> comment on this, for the record?maybe even in the source code?

The change is from legVecS to legRegD.

It doesn't cause any behavioral changes, but aligns the declaration with 
the actual behavior. Regarding Vec=>Reg part, string intrinsic nodes 
don't advertise themselves as vector nodes. For example, they don't set 
C->max_vector_size() and are still used when vectors are disabled 
(-XX:MaxVectorSize=0). Such problems are caught during graph 
verification with -XX:MaxVectorSize=0 or when there are no vector 
operations present.

S=>D change aligns x86_32.ad and x86_64.ad. 32-bit version consistently 
uses regD everywhere. Instead of migrating both x86_32.ad and x86_64.ad, 
it was decided to stick with regD/legVecD.

> I suggest that the warning comments ?(leg)Vec should be used instead? could
> be a little less cryptic.  An unconditional warning like this makes the reader
> wonder, ?so why is it here at all??  Maybe use a cross-reference as in:
>    // Replaces (leg)vec during post-selection cleanup.  See above.

Agree, fixed.

Best regards,
Vladimir Ivanov

> On Nov 19, 2019, at 6:30 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234391
>>
>> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) to generic ones.
>>
>> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD instruction merges will be handled separately.)
>>
>> On a high-level it is organized as follows:
>>
>>   (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec;
>>
>>   (2) at runtime, right after matching is over, a special pass is performed which does:
>>
>>       * replaces vecOper with vec[SDXYZ] depending on mach node type
>>          - vector mach nodes capute bottom_type() of their ideal prototype;
>>
>>       * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec)
>>          - matcher needs them, but they are useless for register allocator (moreover, may cause additional spills);
>>
>>
>>    (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands.
>>
>>
>> Some details:
>>
>>    (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works
>>
>>
>>    (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is enabled only on x86
>>
>>
>>    (3) post-selection analysis is implemented as a single pass over the graph and processing individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() (which is an instance of TypeVect)
>>
>>
>>    (4) most of the analysis is cross-platform and interface with platform-specific code through 3 methods:
>>
>>      static bool is_generic_reg2reg_move(MachNode* m);
>>      // distinguishes MoveVec2Leg/MoveLeg2Vec nodes
>>
>>      static bool is_generic_vector(MachOper* opnd);
>>      // distinguishes vec/legVec operands
>>
>>      static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg);
>>      // constructs fixed-sized vector operand based on ideal reg
>>      //   vec    + Op_Vec[SDXYZ] =>    vec[SDXYZ]
>>      //   legVec + Op_Vec[SDXYZ] => legVec[SDXYZ]
>>
>>
>>    (5) TEMP operands are handled specially:
>>      - TEMP uses max_vector_size() to determine what fixed-sized operand to use
>>          * it is needed to cover reductions which don't produce vectors but scalars
>>      - TEMP_DEF inherits fixed-sized operand type from DEF;
>>
>>
>>    (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper:
>>
>>        - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not ideal_reg().
>>
>>        - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type.
>>
>>
>>    (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see Matcher::get_vector_regmask)
>>
>>
>>    (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands
>>
>>
>>    (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD
>>       - it aligns the code between x86_64.ad and x86_32.ad
>>       - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0)
>>
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL,
>>          performance testing (SPEC* + Octane + micros / G1 + ParGC).
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
>>
>> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf
> 

From christoph.goettschkes at microdoc.com  Tue Dec  3 12:22:49 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Tue, 3 Dec 2019 13:22:49 +0100
Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client
 VMs due to Unrecognized VM option LoopUnrollLimit
In-Reply-To: <20191128124549.BCDF0C6F24@aojmv0009>
References: <20191127125735.B9BE111F377@aojmv0009>
 <bd187ed5-6dc3-0e43-f13e-06e9c48be13b@oracle.com>
 <20191128124549.BCDF0C6F24@aojmv0009>
Message-ID: <mailman.8.1575375877.28664.hotspot-compiler-dev@openjdk.java.net>

Hi Vladimir,

could you have a look at my updated webrev regarding this failing test?
https://cr.openjdk.java.net/~cgo/8234906/webrev.01/

See my inline comments in the mail below.

Thanks,
Christpoh

"hotspot-compiler-dev" <hotspot-compiler-dev-bounces at openjdk.java.net> 
wrote on 2019-11-28 13:44:05:

> From: christoph.goettschkes at microdoc.com
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Date: 2019-11-28 13:46
> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for 
> client VMs due to Unrecognized VM option LoopUnrollLimit
> Sent by: "hotspot-compiler-dev" 
<hotspot-compiler-dev-bounces at openjdk.java.net>
> 
> Hi Vladimir,
> 
> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-27 
20:54:02:
> 
> > From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> > To: christoph.goettschkes at microdoc.com, 
> hotspot-compiler-dev at openjdk.java.net
> > Date: 2019-11-27 20:54
> > Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for 

> > client VMs due to Unrecognized VM option LoopUnrollLimit
> > 
> > Hi Christoph
> > 
> > I was about suggest IgnoreUnrecognizedVMOptions flag but remembered 
> > discussion about 8231954 fix.
> 
> Yes, I try to avoid "IgnoreUnrecognizedVMOptions" because of our 
previous 
> discussion. I also think that it doesn't make sense to execute tests in 
VM 
> configurations for which they are not written for. Most of the compiler 
> tests simply have "IgnoreUnrecognizedVMOptions" and probably waste a 
good 
> amount of time in certain VM configurations.
> 
> > But I think the test should be run with Graal - it does have OSR 
> > compilation and we need to test it.
> 
> Sure. I disabled it, because I thought that the flag "LoopUnrollLimit" 
is 
> required to trigger the faulty behavior, but I don't know much about 
> optimization in the graal JIT.
> 
> > 
> > We can do it by splitting test runs (duplicate @test block with 
> > different run flags) to have 2 tests with different 
> > flags and conditions. See [1].
> > 
> > For existing @run block we use `@requires vm.compiler2.enabled` and 
for
> > new without LoopUnrollLimit - `vm.graal.enabled`.
> 
> I did the following:
> 
> https://cr.openjdk.java.net/~cgo/8234906/webrev.01/
> 
> Could you elaborate how the two flags are related? I though, if graal is 

> used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are 
set 
> to true. Is that correct? I don't have a setup with graal, so I can not 
> test this.
> 
> Thanks,
> Christoph
> 


From martin.doerr at sap.com  Tue Dec  3 14:48:02 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 3 Dec 2019 14:48:02 +0000
Subject: RFR(XS): 8234977: [Aarch64] C1 lacks a membar store in slowpath
 of allocate_array
In-Reply-To: <D82FFF0B-7BDF-4472-B309-B40CEA764BA7@amazon.com>
References: <D82FFF0B-7BDF-4472-B309-B40CEA764BA7@amazon.com>
Message-ID: <VI1PR0202MB331252870F10C8EF55E402069A420@VI1PR0202MB3312.eurprd02.prod.outlook.com>

Hi,

I think there are already sufficient StoreStore barriers:

Regarding the Runtime1 calls, there's a full fence in ~ThreadInVMfromJava() (introduced by JRT_ENTRY) which is called before returning from the C++ code.

Seems like the eden allocation parts have explicit StoreStore barriers AFAICS.

Did I miss anything? Which path is it exactly where you miss such a barrier?

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Liu, Xin
> Sent: Dienstag, 3. Dezember 2019 12:01
> To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> dev at openjdk.java.net>
> Subject: [DMARC FAILURE] RFR(XS): 8234977: [Aarch64] C1 lacks a membar
> store in slowpath of allocate_array
> 
> Hi,
> 
> C1 misses a member for the slow path of alloc_array or alloc_obj on aarch64.
> We met this problem when we ran jcstress on AWS graviton processor,
> which might reorder stores.   Without a storestore member, put_field behind
> might commit to memory ahead of array/object initialization.
> This change tries to fix that by adjusting bound location. Could reviewers help
> me  to review this change?
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234977
> Webrev: https://cr.openjdk.java.net/~xliu/8234977/00/webrev/
> 
> Validation:  I passed jcstress and hotspot tier1 on arch64.
> 
> Thanks,
> --lx
> 
> 


From tobias.hartmann at oracle.com  Tue Dec  3 15:10:34 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 3 Dec 2019 16:10:34 +0100
Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal
 index in PhaseMacroExpand::expand_macro_nodes()
Message-ID: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8234616
http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/

We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because
'macro_idx' is out of the upper bound:
https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591

The problem is that in the previous iteration, we hit the following code path which removes an
unreachable macro node but does not decrement 'macro_idx':
https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594

The problem was introduced in JDK 14 by the fix for JDK-8227384:
https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25

I've changed the loop to a for-loop to make sure the index is always decremented and also
strengthened the asserts. I was not able to create a regression test but the issue reproduces
intermittently with Lucene.

Thanks,
Tobias

From vladimir.x.ivanov at oracle.com  Tue Dec  3 15:16:43 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 3 Dec 2019 18:16:43 +0300
Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal
 index in PhaseMacroExpand::expand_macro_nodes()
In-Reply-To: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com>
References: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com>
Message-ID: <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/

Looks good.

Best regards,
Vladimir Ivanov

> 
> We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because
> 'macro_idx' is out of the upper bound:
> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591
> 
> The problem is that in the previous iteration, we hit the following code path which removes an
> unreachable macro node but does not decrement 'macro_idx':
> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594
> 
> The problem was introduced in JDK 14 by the fix for JDK-8227384:
> https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25
> 
> I've changed the loop to a for-loop to make sure the index is always decremented and also
> strengthened the asserts. I was not able to create a regression test but the issue reproduces
> intermittently with Lucene.
> 
> Thanks,
> Tobias
> 

From martin.doerr at sap.com  Tue Dec  3 15:26:51 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 3 Dec 2019 15:26:51 +0000
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <8736e4ayfa.fsf@redhat.com>
References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com>
Message-ID: <VI1PR0202MB3312B7BD20D3FC5DAD7713949A420@VI1PR0202MB3312.eurprd02.prod.outlook.com>

Hi Roland,

the new asserted condition can never be true when mode == IgnoreStripMined.

So if this is intended, I suggest using 2 assertions:
1. mode != IgnoreStripMined at the beginning
2. (mode == ControlAroundStripMined && use == sfpt) ||!use->is_reachable_from_root()

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Roland Westrelin
> Sent: Sonntag, 1. Dezember 2019 15:55
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8234350: assert(mode == ControlAroundStripMined &&
> (use == sfpt || !use->is_reachable_from_root())) failed: missed a node
> 
> 
> > http://cr.openjdk.java.net/~roland/8234350/webrev.00/
> 
> Anyone for a second review?
> 
> Thanks,
> Roland.


From tobias.hartmann at oracle.com  Tue Dec  3 15:30:52 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 3 Dec 2019 16:30:52 +0100
Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal
 index in PhaseMacroExpand::expand_macro_nodes()
In-Reply-To: <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com>
References: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com>
 <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com>
Message-ID: <51ce8912-e331-d37b-b846-814d089c6cb1@oracle.com>

Thanks Vladimir!

Best regards,
Tobias

On 03.12.19 16:16, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because
>> 'macro_idx' is out of the upper bound:
>> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591
>>
>> The problem is that in the previous iteration, we hit the following code path which removes an
>> unreachable macro node but does not decrement 'macro_idx':
>> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594
>>
>> The problem was introduced in JDK 14 by the fix for JDK-8227384:
>> https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25
>>
>> I've changed the loop to a for-loop to make sure the index is always decremented and also
>> strengthened the asserts. I was not able to create a regression test but the issue reproduces
>> intermittently with Lucene.
>>
>> Thanks,
>> Tobias
>>

From aph at redhat.com  Tue Dec  3 16:17:43 2019
From: aph at redhat.com (Andrew Haley)
Date: Tue, 3 Dec 2019 11:17:43 -0500
Subject: RFR(XS): 8234977: [Aarch64] C1 lacks a membar store in slowpath
 of allocate_array
In-Reply-To: <D82FFF0B-7BDF-4472-B309-B40CEA764BA7@amazon.com>
References: <D82FFF0B-7BDF-4472-B309-B40CEA764BA7@amazon.com>
Message-ID: <b9bd317a-814a-3b59-169a-4e195d73cbb3@redhat.com>

On 12/3/19 6:00 AM, Liu, Xin wrote:
> C1 misses a member for the slow path of alloc_array or alloc_obj on aarch64. 

I don't understand this. In the slow path, the MemAllocator::finish should
insert the necessary membar. Can you show us the slow path which does not
do the membar?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From vladimir.kozlov at oracle.com  Tue Dec  3 18:18:12 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 3 Dec 2019 10:18:12 -0800
Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal
 index in PhaseMacroExpand::expand_macro_nodes()
In-Reply-To: <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com>
References: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com>
 <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com>
Message-ID: <f1976863-512e-2f8a-838e-2d1d22e68d46@oracle.com>

+1

Vladimir K

On 12/3/19 7:16 AM, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because
>> 'macro_idx' is out of the upper bound:
>> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591
>>
>> The problem is that in the previous iteration, we hit the following code path which removes an
>> unreachable macro node but does not decrement 'macro_idx':
>> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594
>>
>> The problem was introduced in JDK 14 by the fix for JDK-8227384:
>> https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25
>>
>> I've changed the loop to a for-loop to make sure the index is always decremented and also
>> strengthened the asserts. I was not able to create a regression test but the issue reproduces
>> intermittently with Lucene.
>>
>> Thanks,
>> Tobias
>>

From vladimir.kozlov at oracle.com  Tue Dec  3 18:33:52 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 3 Dec 2019 10:33:52 -0800
Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client
 VMs due to Unrecognized VM option LoopUnrollLimit
In-Reply-To: <20191203122444.7D54F13FFFC@aojmv0009>
References: <20191127125735.B9BE111F377@aojmv0009>
 <bd187ed5-6dc3-0e43-f13e-06e9c48be13b@oracle.com>
 <20191128124549.BCDF0C6F24@aojmv0009> <20191203122444.7D54F13FFFC@aojmv0009>
Message-ID: <f84aa5cd-1c2d-8a59-68dd-3fbc0cfe921c@oracle.com>

You don't need to duplicate @bug (it was C2 bug anyway). And don't need to check  & !vm.graal.enabled for C2 case (see 
explanation below)

 > Could you elaborate how the two flags are related? I though, if graal is
 > used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are set
 > to true. Is that correct? I don't have a setup with graal, so I can not
 > test this.

When we enable Graal JIT it is used instead of C2. They are mutually exclusive.
Compiler.isC2Enabled() (which sets vm.compiler2.enabled [1]) returns false when isGraalEnabled() returns true [2].

Regards,
Vladimir

[1] http://hg.openjdk.java.net/jdk/jdk/file/138b0f3fe18c/test/jtreg-ext/requires/VMProps.java#l117
[2] http://hg.openjdk.java.net/jdk/jdk/file/138b0f3fe18c/test/lib/sun/hotspot/code/Compiler.java#l80


On 12/3/19 4:22 AM, christoph.goettschkes at microdoc.com wrote:
> Hi Vladimir,
> 
> could you have a look at my updated webrev regarding this failing test?
> https://cr.openjdk.java.net/~cgo/8234906/webrev.01/
> 
> See my inline comments in the mail below.
> 
> Thanks,
> Christpoh
> 
> "hotspot-compiler-dev" <hotspot-compiler-dev-bounces at openjdk.java.net>
> wrote on 2019-11-28 13:44:05:
> 
>> From: christoph.goettschkes at microdoc.com
>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Date: 2019-11-28 13:46
>> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for
>> client VMs due to Unrecognized VM option LoopUnrollLimit
>> Sent by: "hotspot-compiler-dev"
> <hotspot-compiler-dev-bounces at openjdk.java.net>
>>
>> Hi Vladimir,
>>
>> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-27
> 20:54:02:
>>
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> To: christoph.goettschkes at microdoc.com,
>> hotspot-compiler-dev at openjdk.java.net
>>> Date: 2019-11-27 20:54
>>> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for
> 
>>> client VMs due to Unrecognized VM option LoopUnrollLimit
>>>
>>> Hi Christoph
>>>
>>> I was about suggest IgnoreUnrecognizedVMOptions flag but remembered
>>> discussion about 8231954 fix.
>>
>> Yes, I try to avoid "IgnoreUnrecognizedVMOptions" because of our
> previous
>> discussion. I also think that it doesn't make sense to execute tests in
> VM
>> configurations for which they are not written for. Most of the compiler
>> tests simply have "IgnoreUnrecognizedVMOptions" and probably waste a
> good
>> amount of time in certain VM configurations.
>>
>>> But I think the test should be run with Graal - it does have OSR
>>> compilation and we need to test it.
>>
>> Sure. I disabled it, because I thought that the flag "LoopUnrollLimit"
> is
>> required to trigger the faulty behavior, but I don't know much about
>> optimization in the graal JIT.
>>
>>>
>>> We can do it by splitting test runs (duplicate @test block with
>>> different run flags) to have 2 tests with different
>>> flags and conditions. See [1].
>>>
>>> For existing @run block we use `@requires vm.compiler2.enabled` and
> for
>>> new without LoopUnrollLimit - `vm.graal.enabled`.
>>
>> I did the following:
>>
>> https://cr.openjdk.java.net/~cgo/8234906/webrev.01/
>>
>> Could you elaborate how the two flags are related? I though, if graal is
> 
>> used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are
> set
>> to true. Is that correct? I don't have a setup with graal, so I can not
>> test this.
>>
>> Thanks,
>> Christoph
>>
> 

From john.r.rose at oracle.com  Tue Dec  3 18:59:14 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 3 Dec 2019 10:59:14 -0800
Subject: [14] RFR (L): 8234391: C2: Generic vector operands
In-Reply-To: <d836df3a-07ac-9293-15c2-b1be3da120dd@oracle.com>
References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
 <375F08F3-BE51-41F0-A20C-9443CB9F0D46@oracle.com>
 <d836df3a-07ac-9293-15c2-b1be3da120dd@oracle.com>
Message-ID: <675F00CB-AD08-4B08-BAAD-6346DD7A89D8@oracle.com>

On Dec 3, 2019, at 3:49 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Thanks a lot for the review, John.

You are welcome.  Glad to be of assistance.

From sandhya.viswanathan at intel.com  Tue Dec  3 21:33:10 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 3 Dec 2019 21:33:10 +0000
Subject: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on Xeon
 for small vector width operations
Message-ID: <BYAPR11MB29831374E2A40DC1FFC3AAEBEF420@BYAPR11MB2983.namprd11.prod.outlook.com>

For vector replicate and reduction operations vinsert and vextract instructions are used.
When UseAVX level is set to 3, these instructions are unnecessarily encoded with 512-bit vector width.
Only for KNL platform which doesn't support AVX512 variable length encoding, the 512-bit wide instruction need to be used.
All other Xeon platforms should use the appropriate 256-bit wide vector instruction.

JBS: https://bugs.openjdk.java.net/browse/JDK-8235288
Webrev: http://cr.openjdk.java.net/~sviswanathan/8235288/webrev.00/

Please review and approve.

Best Regards,
Sandhya


From vladimir.kozlov at oracle.com  Tue Dec  3 22:41:21 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 3 Dec 2019 14:41:21 -0800
Subject: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on
 Xeon for small vector width operations
In-Reply-To: <BYAPR11MB29831374E2A40DC1FFC3AAEBEF420@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <BYAPR11MB29831374E2A40DC1FFC3AAEBEF420@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <3C7A0114-AD7B-4032-8D50-0F2D31EB97BC@oracle.com>

Looks good.

Thanks
Vladimir

> On Dec 3, 2019, at 1:33 PM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
> 
> For vector replicate and reduction operations vinsert and vextract instructions are used.
> When UseAVX level is set to 3, these instructions are unnecessarily encoded with 512-bit vector width.
> Only for KNL platform which doesn't support AVX512 variable length encoding, the 512-bit wide instruction need to be used.
> All other Xeon platforms should use the appropriate 256-bit wide vector instruction.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235288
> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235288/webrev.00/
> 
> Please review and approve.
> 
> Best Regards,
> Sandhya
> 


From claes.redestad at oracle.com  Tue Dec  3 23:38:55 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 4 Dec 2019 00:38:55 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
Message-ID: <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>

Hi Thomas and others,

thanks for the thorough feedback!

I also had offline discussions about the existing utilities in
zUtils.inline.hpp with Per Liden, and decided to try and work this into
a templated solution that enables both 32-bit and 64-bit
implementations to work correctly - along with signed variants
(especially nice with types such as size_t).

  http://cr.openjdk.java.net/~redestad/8234331/open.03

This is a pretty significant rework. Changes since .01:

- introduce utilities/powerOfTwo.hpp
- move round_up_* and round_down_* from zUtils, re-implement to use
   the new implementation
- implement next_* as round_up_*(value + 1) as per John's suggestion.
- added tests
- corrected the issues Thomas, Ivan and others pointed out, mainly
   ensuring we don't depend on undefined behavior in neither product
   code, asserts nor tests
- .. and ensure to use next and round_up as appropriate to preserve
   behavior

Other notes:

- Moving existing power-of-two functions like is_power_of_2 from
   globalDefinitions.hpp to powerOfTwo.hpp would be straightforward, but
   tedious. I'd like to defer this to a follow-up
- The xlc implementation is untested, but should work. If someone can
   verify, I'd be much obliged.
- Many thanks to Erik ?sterlund, who guided me through a maze of
   undefined behavior and template metaprogramming to a workable and
   relatively clean implementation
- HotSpot shrinks by ~15Kb :-)

Testing: tier1-5

On 2019-11-28 08:34, Thomas St?fe wrote:
> Hi Claes,
> 
> I think this is useful. Why not a 64bit variant too? If you do not want 
> to go through the hassle of providing a count_leading_zeros(uint64_t), 
> you could call the 32bit variant twice and take care of endianness for 
> the caller.
> 
> --
> 
> In inline int32_t next_power_of_two(int32_t value) , should we weed out 
> negative input values right away instead of asserting at the end of the 
> function?
> 
> --
> 
> The functions will always return the next power of two, even if the 
> input is a power of two - e.g. "2" for "1". Is that intended? It would 
> be nice to have an API comment in the header describing these corner 
> cases (what happens for negative input, what happens if input is power 2).
> 
> --
> 
> The patch can cause subtle differences in some caller code, I think, if 
> input value is a power of 2 already. See e.g:
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html
> 
> - ?i=16;
> - ?while( i < size ) i <<= 1;
> + ?i = MAX2(16, (int)next_power_of_two(size));
> 
> If i == size == 16, old code would keep i==16, new code would come to 
> i==32, I think.
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html
> 
>  ?//------------------------------round_up---------------------------------------
>  ?// Round up to nearest power of 2
> -uint NodeHash::round_up( uint x ) {
> - ?x += (x>>2); ? ? ? ? ? ? ? ? ?// Add 25% slop
> - ?if( x <16 ) return 16; ? ? ? ?// Small stuff
> - ?uint i=16;
> - ?while( i < x ) i <<= 1; ? ? ? // Double to fit
> - ?return i; ? ? ? ? ? ? ? ? ? ? // Return hash table size
> +uint NodeHash::round_up(uint x) {
> + ?x += (x >> 2); ? ? ? ? ? ? ? ? ?// Add 25% slop
> + ?return MAX2(16U, next_power_of_two(x));
>  ?}
> 
> same here. If x == 16, before we'd return 16, now 32.
> 
> ---
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html
> 
> I admit I do not understand the current coding :) I do not believe it 
> works for all input values, e.g. were 
> get_java_thread_list()->length()==1025, we'd get 1861 - if I am not 
> mistaken. Your code is definitely clearer but not equivalent to the old one.

FTR, the algorithm used is described here:

http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2

(1025 should round up to 2048)

Thanks!

/Claes

> 
> ---
> 
> In the end, I wonder whether we should have two kind of APIs, or a 
> parameter, distinguishing between "next power of 2" and "next power of 2 
> unless input value is already power of 2".
> 
> Cheers, Thomas
> 
> 
> 
> 
> 
> On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad 
> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
> 
>     Hi,
> 
>     in various places in the hotspot we have custom code to calculate the
>     next power of two, some of which have potential to go into an infinite
>     loop in case of an overflow.
> 
>     This patch proposes adding next_power_of_two utility methods which
>     avoid infinite loops on overflow, while providing slightly more
>     efficient code in most cases.
> 
>     Bug: https://bugs.openjdk.java.net/browse/JDK-8234331
>     Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
> 
>     Testing: tier1-3
> 
>     Thanks!
> 
>     /Claes
> 

From igor.ignatyev at oracle.com  Wed Dec  4 01:49:02 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 3 Dec 2019 17:49:02 -0800
Subject: RFR(S) : 8129092 :
 compiler/intrinsics/classcast/NullCheckDroppingsTest.java testVarClassCast()
 can fail
Message-ID: <6F72710B-8DA3-4733-A5CC-E3CB60005D4A@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01
> 69 lines changed: 38 ins; 2 del; 29 mod;

Hi all,

could you please review the small patch which makes the test more robust by using newly introduced JFR Deoptimization event?

the test used to use WhiteBox.isMethodCompiled to check if there was deoptimization, which, in case uncommon trap's action is none or maybe_recompile (which has been seen), is incorrect. the patch replaces WhiteBox method call w/ asserting JFR events, and check not only that deoptimization happens, but also check its reason is null_check. testing that, I noticed that 'testVarClassCast' doesn't hit uncommon trap due to null_check, but because of unstable_if in the ternary operator at L#180, so I modified the test to pass an instance of Class, which required some small changes.

JBS: https://bugs.openjdk.java.net/browse/JDK-8129092
webrev: http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01
testing: compiler/intrinsics/classcast/NullCheckDroppingsTest.java

Thanks,
-- Igor

From john.r.rose at oracle.com  Wed Dec  4 02:28:17 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 3 Dec 2019 18:28:17 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
Message-ID: <3A61B685-1311-48D2-92F8-0ABE95F9EFD5@oracle.com>

On Dec 3, 2019, at 3:38 PM, Claes Redestad <claes.redestad at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.03 <http://cr.openjdk.java.net/~redestad/8234331/open.03>

Nice work!

A few minor complaints follow:

I?m not totally comfortable with the proliferation of tiny header files.
Are we crashing from one extreme (globalDefinitions.hpp) toward the
other (usefulZeroConstant.hpp)?  So having count_leading_zeros.hpp
is arguably a correction, but then adding powerOfTwo.hpp feels like
an over-correction.  To be followed by more over-corrections:
count_trailing_zeroes.hpp, populationCount.hpp, and so on.

You won?t be surprised to hear that I think this would be excessive
splitting, and that a lumpier outcome would feel better to me, as a
successor to the chaos globalDefinitions.hpp.  I propose putting
anything involving integral bitmask operations into intBitmasks.hpp,
including today?s count_trailing_zeroes.hpp and all the new stuff
that will be like it.  Start by renaming count_trailing_zeroes.hpp and
rolling the new stuff into it.  I can wait for this, but I?m going to grumble
some more if we keep going down this road to TinyHeaderVille.

This particular code :
+    if (high != 0) {
+      _BitScanReverse(&index, x);
+    } else {
+      _BitScanReverse(&index, x);
+      index += 32;
+    }

is clearly untested, since it?s wrong. The `index+=32` belongs on the other
branch.  (Or is it me?)  The root cause of this is the needless obfuscation
of the `index ^= 31` and `index ^= 63`.  Really??  Just say `31 - index` and
`63 - index`, and the arithmetic reasoning will take care of itself.

In the new code, `assert(lz > ?)` and `assert(lz < ?)` statements are in an
unpredictable relative order, making it harder to compare and contrast the
four versions.  Suggest putting the < before the > case more regularly.

To `next_power_of_2` I suggest adding an assert that the increment
will not overflow.  The asserts won't catch `next_power_of_2(max_jint)`,
not for sure.

? John B. Twiddler


From vladimir.kozlov at oracle.com  Wed Dec  4 02:41:23 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 3 Dec 2019 18:41:23 -0800
Subject: RFR(S) : 8129092 :
 compiler/intrinsics/classcast/NullCheckDroppingsTest.java testVarClassCast()
 can fail
In-Reply-To: <6F72710B-8DA3-4733-A5CC-E3CB60005D4A@oracle.com>
References: <6F72710B-8DA3-4733-A5CC-E3CB60005D4A@oracle.com>
Message-ID: <ba68d335-03a2-43fb-5832-b2f6643eb2c8@oracle.com>

LGTM

Thanks,
Vladimir K

On 12/3/19 5:49 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01
>> 69 lines changed: 38 ins; 2 del; 29 mod;
> 
> Hi all,
> 
> could you please review the small patch which makes the test more robust by using newly introduced JFR Deoptimization event?
> 
> the test used to use WhiteBox.isMethodCompiled to check if there was deoptimization, which, in case uncommon trap's action is none or maybe_recompile (which has been seen), is incorrect. the patch replaces WhiteBox method call w/ asserting JFR events, and check not only that deoptimization happens, but also check its reason is null_check. testing that, I noticed that 'testVarClassCast' doesn't hit uncommon trap due to null_check, but because of unstable_if in the ternary operator at L#180, so I modified the test to pass an instance of Class, which required some small changes.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8129092
> webrev: http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01
> testing: compiler/intrinsics/classcast/NullCheckDroppingsTest.java
> 
> Thanks,
> -- Igor
> 

From igor.ignatyev at oracle.com  Wed Dec  4 05:09:24 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 3 Dec 2019 21:09:24 -0800
Subject: RFR(S) : 8129092 :
 compiler/intrinsics/classcast/NullCheckDroppingsTest.java testVarClassCast()
 can fail
In-Reply-To: <ba68d335-03a2-43fb-5832-b2f6643eb2c8@oracle.com>
References: <6F72710B-8DA3-4733-A5CC-E3CB60005D4A@oracle.com>
 <ba68d335-03a2-43fb-5832-b2f6643eb2c8@oracle.com>
Message-ID: <69FA66AF-51FE-4235-BB8B-773EB884E295@oracle.com>

thanks Vladimir, pushed.

-- Igor

> On Dec 3, 2019, at 6:41 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> LGTM
> 
> Thanks,
> Vladimir K
> 
> On 12/3/19 5:49 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01
>>> 69 lines changed: 38 ins; 2 del; 29 mod;
>> Hi all,
>> could you please review the small patch which makes the test more robust by using newly introduced JFR Deoptimization event?
>> the test used to use WhiteBox.isMethodCompiled to check if there was deoptimization, which, in case uncommon trap's action is none or maybe_recompile (which has been seen), is incorrect. the patch replaces WhiteBox method call w/ asserting JFR events, and check not only that deoptimization happens, but also check its reason is null_check. testing that, I noticed that 'testVarClassCast' doesn't hit uncommon trap due to null_check, but because of unstable_if in the ternary operator at L#180, so I modified the test to pass an instance of Class, which required some small changes.
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8129092
>> webrev: http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01
>> testing: compiler/intrinsics/classcast/NullCheckDroppingsTest.java
>> Thanks,
>> -- Igor


From tobias.hartmann at oracle.com  Wed Dec  4 05:59:19 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 4 Dec 2019 06:59:19 +0100
Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal
 index in PhaseMacroExpand::expand_macro_nodes()
In-Reply-To: <f1976863-512e-2f8a-838e-2d1d22e68d46@oracle.com>
References: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com>
 <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com>
 <f1976863-512e-2f8a-838e-2d1d22e68d46@oracle.com>
Message-ID: <712012d5-9ec8-ca3f-ff2d-801c15fb2851@oracle.com>

Thanks Vladimir!

Best regards,
Tobias

On 03.12.19 19:18, Vladimir Kozlov wrote:
> +1
> 
> Vladimir K
> 
> On 12/3/19 7:16 AM, Vladimir Ivanov wrote:
>>
>>> http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/
>>
>> Looks good.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because
>>> 'macro_idx' is out of the upper bound:
>>> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591
>>>
>>> The problem is that in the previous iteration, we hit the following code path which removes an
>>> unreachable macro node but does not decrement 'macro_idx':
>>> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594
>>>
>>> The problem was introduced in JDK 14 by the fix for JDK-8227384:
>>> https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25
>>>
>>> I've changed the loop to a for-loop to make sure the index is always decremented and also
>>> strengthened the asserts. I was not able to create a regression test but the issue reproduces
>>> intermittently with Lucene.
>>>
>>> Thanks,
>>> Tobias
>>>

From Pengfei.Li at arm.com  Wed Dec  4 06:25:30 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Wed, 4 Dec 2019 06:25:30 +0000
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <ea8c2bc4-c7df-f48a-6f0d-73a627c1a7ca@oracle.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com>
 <ea8c2bc4-c7df-f48a-6f0d-73a627c1a7ca@oracle.com>
Message-ID: <DB7PR08MB3115D39AB268FFF8A577DDE2965D0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Thanks Tobias. Could anyone help push this? It's now reviewed by adinn, aph and thartmann.

> Hi,
> 
> this looks good to me.
> 
> Best regards,
> Tobias
> 
> On 30.11.19 02:02, Ioi Lam wrote:
> > Hi Pengfei,
> >
> > I have cc-ed hotspot-compiler-dev at openjdk.java.net.
> >
> > Please do not push the patch until someone from hotspot-compiler-dev
> has looked at it.
> >
> > Many people are away due to Thanksgiving in the US.
> >
> > Thanks
> > - Ioi
> >
> > On 11/28/19 7:56 PM, Pengfei Li (Arm Technology China) wrote:
> >> Hi,
> >>
> >> Please help review this small fix for 64-bit client build.
> >>
> >> Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/
> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234791
> >>
> >> Current 64-bit client VM build fails because errors occurred in
> >> dumping the CDS archive. In JDK 12, we enabled "Default CDS
> >> Archives"[1] which runs "java -Xshare:dump" after linking the JDK
> >> image. But for Client VM build on 64-bit platforms, the ergonomic
> >> flag UseCompressedOops is not set.[2] This leads to VM exits in
> >> checking the flags for dumping the shared archive.[3]
> >>
> >> This change removes the "#if defined" macro to make shared archive
> >> dump successful in 64-bit client build. By tracking the history of
> >> the macro, I found it is initially added as "#ifndef COMPILER1"[4] 10
> >> years ago when C1 did not have a good support of compressed oops and
> >> modified to current shape[5] in the implementation of tiered
> >> compilation. It should be safe to be removed today.
> >>
> >> This patch also fixes another client build issue on AArch64.
> >>
> >> [1] http://openjdk.java.net/jeps/341
> >> [2]
> >> http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/shar
> >> e/runtime/arguments.cpp#l1694
> >> [3]
> >> http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/shar
> >> e/runtime/arguments.cpp#l3551 [4]
> >> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7
> >> [5]
> >> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56

--
Thanks,
Pengfei

From tobias.hartmann at oracle.com  Wed Dec  4 07:09:45 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 4 Dec 2019 08:09:45 +0100
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <DB7PR08MB3115D39AB268FFF8A577DDE2965D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com>
 <ea8c2bc4-c7df-f48a-6f0d-73a627c1a7ca@oracle.com>
 <DB7PR08MB3115D39AB268FFF8A577DDE2965D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <8315aeca-f0cb-b41e-782a-28e545772d09@oracle.com>


On 04.12.19 07:25, Pengfei Li (Arm Technology China) wrote:
> Thanks Tobias. Could anyone help push this? It's now reviewed by adinn, aph and thartmann.

Sure, pushed.

Best regards,
Tobias

From xxinliu at amazon.com  Wed Dec  4 07:23:06 2019
From: xxinliu at amazon.com (Liu, Xin)
Date: Wed, 4 Dec 2019 07:23:06 +0000
Subject: RFR(XS): 8234977: [Aarch64] C1 lacks a membar store in slowpath
 of allocate_array
In-Reply-To: <b9bd317a-814a-3b59-169a-4e195d73cbb3@redhat.com>
References: <D82FFF0B-7BDF-4472-B309-B40CEA764BA7@amazon.com>
 <b9bd317a-814a-3b59-169a-4e195d73cbb3@redhat.com>
Message-ID: <D840DAE3-3392-42D4-85F6-1904476C232A@amazon.com>

Hi, Andrew and Martin, 

The slowpath I refer to is the JRT new_type_array. 
 https://hg.openjdk.java.net/jdk/jdk/file/a1802614d6fe/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#l855

Martin is right.  the JRT function is actually guarded by a mfence.   Thank you to point it out! 
My previous understanding was wrong.  I acknowledge that the extra "membar(StoreStore)" is unnecessary  when the control flow returns from the slow path. 

I withdraw my webrev.  My fix can  solve our problem on jdk8u and jdk11, but it is too lame. I found that the root cause  is JDK-8233839, so I marked my bug 'duplication'. 
I will propose to backport  JDK-8233839 to jdk8u-aarch64. 

On the other side,  can I say  C2 has a redundant membar(StoreStore). Is it worth optimizing it? 
The branch at  0x0000ffff8056bef0 of https://bugs.openjdk.java.net/secure/attachment/85882/hotspot_pid4218.c2.log is the return from slow-path

Thanks,
--lx


?On 12/3/19, 8:18 AM, "aph at redhat.com" <aph at redhat.com> wrote:

    On 12/3/19 6:00 AM, Liu, Xin wrote:
    > C1 misses a member for the slow path of alloc_array or alloc_obj on aarch64. 
    
    I don't understand this. In the slow path, the MemAllocator::finish should
    insert the necessary membar. Can you show us the slow path which does not
    do the membar?
    
    -- 
    Andrew Haley  (he/him)
    Java Platform Lead Engineer
    Red Hat UK Ltd. <https://www.redhat.com>
    https://keybase.io/andrewhaley
    EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
    
    
From christoph.goettschkes at microdoc.com  Wed Dec  4 09:49:55 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Wed, 4 Dec 2019 10:49:55 +0100
Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client
 VMs due to Unrecognized VM option LoopUnrollLimit
In-Reply-To: <f84aa5cd-1c2d-8a59-68dd-3fbc0cfe921c@oracle.com>
References: <20191127125735.B9BE111F377@aojmv0009>
 <bd187ed5-6dc3-0e43-f13e-06e9c48be13b@oracle.com>
 <20191128124549.BCDF0C6F24@aojmv0009> <20191203122444.7D54F13FFFC@aojmv0009>
 <f84aa5cd-1c2d-8a59-68dd-3fbc0cfe921c@oracle.com>
Message-ID: <mailman.9.1575453107.28664.hotspot-compiler-dev@openjdk.java.net>

Hi Vladimir,

thanks for the explanation and for pointing to the code.

I integrated your suggestions into this new webrev:
https://cr.openjdk.java.net/~cgo/8234906/webrev.02/

I already created the changeset with you as a reviewer, so if you are fine 
with this version, may I ask you to sponsor it and commit it into the 
repository for me?
https://cr.openjdk.java.net/~cgo/8234906/webrev.02/jdk-jdk.changeset

Thanks,
Christoph

Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-12-03 19:33:52:

> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> To: christoph.goettschkes at microdoc.com
> Cc: hotspot-compiler-dev at openjdk.java.net
> Date: 2019-12-03 19:34
> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for 
> client VMs due to Unrecognized VM option LoopUnrollLimit
> 
> You don't need to duplicate @bug (it was C2 bug anyway). And don't need
> to check  & !vm.graal.enabled for C2 case (see 
> explanation below)
> 
>  > Could you elaborate how the two flags are related? I though, if graal 
is
>  > used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are 
set
>  > to true. Is that correct? I don't have a setup with graal, so I can 
not
>  > test this.
> 
> When we enable Graal JIT it is used instead of C2. They are mutually 
exclusive.
> Compiler.isC2Enabled() (which sets vm.compiler2.enabled [1]) returns 
> false when isGraalEnabled() returns true [2].
> 
> Regards,
> Vladimir
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/138b0f3fe18c/test/jtreg-
> ext/requires/VMProps.java#l117
> [2] http://hg.openjdk.java.net/jdk/jdk/file/138b0f3fe18c/test/lib/sun/
> hotspot/code/Compiler.java#l80
> 
> 
> On 12/3/19 4:22 AM, christoph.goettschkes at microdoc.com wrote:
> > Hi Vladimir,
> > 
> > could you have a look at my updated webrev regarding this failing 
test?
> > https://cr.openjdk.java.net/~cgo/8234906/webrev.01/
> > 
> > See my inline comments in the mail below.
> > 
> > Thanks,
> > Christpoh
> > 
> > "hotspot-compiler-dev" <hotspot-compiler-dev-bounces at openjdk.java.net>
> > wrote on 2019-11-28 13:44:05:
> > 
> >> From: christoph.goettschkes at microdoc.com
> >> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> >> Cc: hotspot-compiler-dev at openjdk.java.net
> >> Date: 2019-11-28 13:46
> >> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails 
for
> >> client VMs due to Unrecognized VM option LoopUnrollLimit
> >> Sent by: "hotspot-compiler-dev"
> > <hotspot-compiler-dev-bounces at openjdk.java.net>
> >>
> >> Hi Vladimir,
> >>
> >> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-27
> > 20:54:02:
> >>
> >>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> >>> To: christoph.goettschkes at microdoc.com,
> >> hotspot-compiler-dev at openjdk.java.net
> >>> Date: 2019-11-27 20:54
> >>> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails 
for
> > 
> >>> client VMs due to Unrecognized VM option LoopUnrollLimit
> >>>
> >>> Hi Christoph
> >>>
> >>> I was about suggest IgnoreUnrecognizedVMOptions flag but remembered
> >>> discussion about 8231954 fix.
> >>
> >> Yes, I try to avoid "IgnoreUnrecognizedVMOptions" because of our
> > previous
> >> discussion. I also think that it doesn't make sense to execute tests 
in
> > VM
> >> configurations for which they are not written for. Most of the 
compiler
> >> tests simply have "IgnoreUnrecognizedVMOptions" and probably waste a
> > good
> >> amount of time in certain VM configurations.
> >>
> >>> But I think the test should be run with Graal - it does have OSR
> >>> compilation and we need to test it.
> >>
> >> Sure. I disabled it, because I thought that the flag 
"LoopUnrollLimit"
> > is
> >> required to trigger the faulty behavior, but I don't know much about
> >> optimization in the graal JIT.
> >>
> >>>
> >>> We can do it by splitting test runs (duplicate @test block with
> >>> different run flags) to have 2 tests with different
> >>> flags and conditions. See [1].
> >>>
> >>> For existing @run block we use `@requires vm.compiler2.enabled` and
> > for
> >>> new without LoopUnrollLimit - `vm.graal.enabled`.
> >>
> >> I did the following:
> >>
> >> https://cr.openjdk.java.net/~cgo/8234906/webrev.01/
> >>
> >> Could you elaborate how the two flags are related? I though, if graal 
is
> > 
> >> used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are
> > set
> >> to true. Is that correct? I don't have a setup with graal, so I can 
not
> >> test this.
> >>
> >> Thanks,
> >> Christoph
> >>
> > 
> 


From claes.redestad at oracle.com  Wed Dec  4 10:18:26 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 4 Dec 2019 11:18:26 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <3A61B685-1311-48D2-92F8-0ABE95F9EFD5@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <3A61B685-1311-48D2-92F8-0ABE95F9EFD5@oracle.com>
Message-ID: <3b9b77dc-7a2d-8365-e060-30d13ba355a9@oracle.com>

On 2019-12-04 03:28, John Rose wrote:
> On Dec 3, 2019, at 3:38 PM, Claes Redestad <claes.redestad at oracle.com 
> <mailto:claes.redestad at oracle.com>> wrote:
>>
>> http://cr.openjdk.java.net/~redestad/8234331/open.03
> 
> Nice work!

Thanks!

> 
> A few minor complaints follow:
> 
> I?m not totally comfortable with the proliferation of tiny header files.
> Are we crashing from one extreme (globalDefinitions.hpp) toward the
> other (usefulZeroConstant.hpp)? ?So having?count_leading_zeros.hpp
> is arguably a correction, but then adding?powerOfTwo.hpp feels like
> an over-correction. ?To be followed by more over-corrections:
> count_trailing_zeroes.hpp, populationCount.hpp, and so on.
> 
> You won?t be surprised to hear that I think this would be excessive
> splitting, and that a lumpier outcome would feel better to me, as a
> successor to the chaos globalDefinitions.hpp. ?I propose putting
> anything involving integral bitmask operations into intBitmasks.hpp,
> including today?s count_trailing_zeroes.hpp and all the new stuff
> that will be like it. ?Start by renaming count_trailing_zeroes.hpp and
> rolling the new stuff into it. ?I can wait for this, but I?m going to 
> grumble
> some more if we keep going down this road to TinyHeaderVille.

I hear microservices is all the rage these days. :-)

I can agree that one-header-per-logical-function would become excessive,
but I think organizing functions into more logical units needs to be 
done iteratively as patterns emerge.

> 
> This particular code :
> + ? ?if (high != 0) {
> + ? ? ?_BitScanReverse(&index, x);
> + ? ?} else {
> + ? ? ?_BitScanReverse(&index, x);
> + ? ? ?index += 32;
> + ? ?}
> 
> is clearly untested, since it?s wrong. The `index+=32` belongs on the other
> branch. ?(Or is it me?) 

You're both right and wrong: yes, this block is untested (we don't have
32-bit Windows builds in our test system). And the code is/was wrong,
since the statement in the first branch should read:

_BitScanReverse(&index, high); // high, not x

However, the += 32 is in the right place: when the upper 32-bit word of
a 64-bit is non-zero, the number of leading zeros is in the [0-31] 
range, but if it's in the lower 32-bit word the number of leading zeros 
is in the [32-63] range. _BitScanReverse only operates on a 32-bit int.

> The root cause of this is the needless obfuscation
> of the `index ^= 31` and `index ^= 63`. ?Really?? ?Just say `31 - index` and
> `63 - index`, and the arithmetic reasoning will take care of itself.

I agree this is a bit obscure (a habit I probably picked up from 
Hacker's Delight and other sources), and will go with 31u|63u - index.

> 
> In the new code, `assert(lz > ?)`?and?`assert(lz < ?)`?statements are in an
> unpredictable relative order, making it harder to compare?and contrast the
> four versions. ?Suggest putting the < before the > case more regularly.

Fixed.

> 
> To?`next_power_of_2` I suggest adding an assert that the increment
> will not overflow. ?The asserts won't catch `next_power_of_2(max_jint)`,
> not for sure.

Yes, adding a check for the max value should be sufficient, since other
overflows are asserted against in round_up:

assert(value != std::numeric_limits<T>::max(), "Overflow");

Updating in place and rerunning gtests on all platforms.

Thanks!

/Claes
> 
> ? John B. Twiddler
> 

From vladimir.x.ivanov at oracle.com  Wed Dec  4 12:40:21 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 4 Dec 2019 15:40:21 +0300
Subject: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on
 Xeon for small vector width operations
In-Reply-To: <BYAPR11MB29831374E2A40DC1FFC3AAEBEF420@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <BYAPR11MB29831374E2A40DC1FFC3AAEBEF420@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <ef716acf-ba2b-2ac6-bf44-113e766cc7c5@oracle.com>


> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235288/webrev.00/

Looks good.

Best regards,
Vladimir Ivanov

From thomas.stuefe at gmail.com  Wed Dec  4 16:32:44 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 4 Dec 2019 17:32:44 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
Message-ID: <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>

Hi Claes,

I think this is very good and useful.

General remarks:

- Pity that it only works for 32/64bit values. It would be neat to have
variants for 8 and 16 bit values too.

- +1 for a "lumpier" header.

- the round_down() variants: would it make sense to allow 0 as valid input?
I can imagine scenarios where 0 could be a valid input and result for this
operation.

- I tested on AIX, builds and gtests run through without errors.

----


http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/runtime/threadSMR.cpp.udiff.html

+  int hash_table_size =
round_up_power_of_2(MIN2((int)get_java_thread_list()->length(), 32) << 1);

Style nit, I would have preferred this to be two operations for
readability.


http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/powerOfTwo.hpp.html

  43 inline typename EnableIf<IsSigned<T>::value, T>::type
round_down_power_of_2(T value) {
...

  47   assert(lz < sizeof(T) * BitsPerByte, "Sanity");

Do we need this if we check the input value for >0 ? (Same question for the
signed version of round_up() below).

  48   assert(lz > 0, "Overflow to negative");

Same here?
Also, small nit, the assert text is slightly off since no overflow happened
yet, but the input value is negative.

(The asserts do not bother me much, I am just trying to understand why we
need them)

-

 100 // Accepts 0 (returns 1); overflows if value is larger than or  equal
to 2^31
 101 // or 2^63, for 32- and 64-bit integers, respectively

Comment is a bit off. True only for unsigned values. Also it should mention
that overflow results in assert to be in sync with the other comments.

----

http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html

template <typename T> inline uint32_t count_leading_zeros(T x) {

Instead of all the "if sizeof(T) == 4|8", could we specialize the functions
for different operand sizes? Example:

template <typename T, size_t n> struct Impl;
template <typename T> struct Impl<T, 8> { static int doit(T v) { return
__builtin_clzll(v); } };
template <typename T> struct Impl<T, 4> { static int doit(T v) { return
__builtin_clz(v); } };
template <typename T> int count_leading_zero(T v) { return Impl<T,
sizeof(T)>::doit(v); }

Would make it easier later to plug in implementations for different sizes,
and we'd still get a compile time error for invalid input types.

-

I wondered why count_leading_zeros did not allow 0 as input until I saw
that __builtin_clz(ll) unfolds to bsr; and that does not work for input=0.
I still find it an artificial limitation but that was there before your
patch.

----

Thank you for the enhanced comment in the fallback implementation, thats
helpful.

----

http://cr.openjdk.java.net/~redestad/8234331/open.03/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html

It bothers me a bit that we do not test invalid input and overflow. There
is a way, I believe, to have gtests expect an assert. We have
TEST_VM_ASSERT but I am not sure if that is a good idea since crashing
takes time. I also see that we only use it at a single place, so I guess
others share the same concern :)

Alternatively, one could have variants of round_up/down which return a
definite value on overflow. I wonder whether this may be useful someplace
outside the tests.

-

Just a thought, but could you cut down the code for the many TEST functions
if you would use a templatized test function? Basically, something like

template <class T> struct MyMaxImpl;
template <int32_t> MyMaxImpl   { static int32 get() { return
int32_max_pow2; } };
.. etc

template <typename T> void round_up_test() {
  EXPECT_EQ(round_up_power_of_2((T)1), (T)1) << "value = " << 1;
...
    const T max = MyMaxImpl<T>::get();
  EXPECT_EQ(round_up_power_of_2(max - 1), max)
...
}

TEST(power_of_2, round_up_power_of_2_signed_32) {
  round_up_do_test<int32_t>();
}

That way the test would also be easily extendable later if we decide to add
support for uint8_t or uint16_t.

Thanks, Thomas


On Wed, Dec 4, 2019 at 12:36 AM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi Thomas and others,
>
> thanks for the thorough feedback!
>
> I also had offline discussions about the existing utilities in
> zUtils.inline.hpp with Per Liden, and decided to try and work this into
> a templated solution that enables both 32-bit and 64-bit
> implementations to work correctly - along with signed variants
> (especially nice with types such as size_t).
>
>   http://cr.openjdk.java.net/~redestad/8234331/open.03
>
> This is a pretty significant rework. Changes since .01:
>
> - introduce utilities/powerOfTwo.hpp
> - move round_up_* and round_down_* from zUtils, re-implement to use
>    the new implementation
> - implement next_* as round_up_*(value + 1) as per John's suggestion.
> - added tests
> - corrected the issues Thomas, Ivan and others pointed out, mainly
>    ensuring we don't depend on undefined behavior in neither product
>    code, asserts nor tests
> - .. and ensure to use next and round_up as appropriate to preserve
>    behavior
>
> Other notes:
>
> - Moving existing power-of-two functions like is_power_of_2 from
>    globalDefinitions.hpp to powerOfTwo.hpp would be straightforward, but
>    tedious. I'd like to defer this to a follow-up
> - The xlc implementation is untested, but should work. If someone can
>    verify, I'd be much obliged.
> - Many thanks to Erik ?sterlund, who guided me through a maze of
>    undefined behavior and template metaprogramming to a workable and
>    relatively clean implementation
> - HotSpot shrinks by ~15Kb :-)
>
> Testing: tier1-5
>
> On 2019-11-28 08:34, Thomas St?fe wrote:
> > Hi Claes,
> >
> > I think this is useful. Why not a 64bit variant too? If you do not want
> > to go through the hassle of providing a count_leading_zeros(uint64_t),
> > you could call the 32bit variant twice and take care of endianness for
> > the caller.
> >
> > --
> >
> > In inline int32_t next_power_of_two(int32_t value) , should we weed out
> > negative input values right away instead of asserting at the end of the
> > function?
> >
> > --
> >
> > The functions will always return the next power of two, even if the
> > input is a power of two - e.g. "2" for "1". Is that intended? It would
> > be nice to have an API comment in the header describing these corner
> > cases (what happens for negative input, what happens if input is power
> 2).
> >
> > --
> >
> > The patch can cause subtle differences in some caller code, I think, if
> > input value is a power of 2 already. See e.g:
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html
> >
> > -  i=16;
> > -  while( i < size ) i <<= 1;
> > +  i = MAX2(16, (int)next_power_of_two(size));
> >
> > If i == size == 16, old code would keep i==16, new code would come to
> > i==32, I think.
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html
> >
> >
>  //------------------------------round_up---------------------------------------
> >   // Round up to nearest power of 2
> > -uint NodeHash::round_up( uint x ) {
> > -  x += (x>>2);                  // Add 25% slop
> > -  if( x <16 ) return 16;        // Small stuff
> > -  uint i=16;
> > -  while( i < x ) i <<= 1;       // Double to fit
> > -  return i;                     // Return hash table size
> > +uint NodeHash::round_up(uint x) {
> > +  x += (x >> 2);                  // Add 25% slop
> > +  return MAX2(16U, next_power_of_two(x));
> >   }
> >
> > same here. If x == 16, before we'd return 16, now 32.
> >
> > ---
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html
> >
> > I admit I do not understand the current coding :) I do not believe it
> > works for all input values, e.g. were
> > get_java_thread_list()->length()==1025, we'd get 1861 - if I am not
> > mistaken. Your code is definitely clearer but not equivalent to the old
> one.
>
> FTR, the algorithm used is described here:
>
> http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
>
> (1025 should round up to 2048)
>
> Thanks!
>
> /Claes
>
> >
> > ---
> >
> > In the end, I wonder whether we should have two kind of APIs, or a
> > parameter, distinguishing between "next power of 2" and "next power of 2
> > unless input value is already power of 2".
> >
> > Cheers, Thomas
> >
> >
> >
> >
> >
> > On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad
> > <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
> >
> >     Hi,
> >
> >     in various places in the hotspot we have custom code to calculate the
> >     next power of two, some of which have potential to go into an
> infinite
> >     loop in case of an overflow.
> >
> >     This patch proposes adding next_power_of_two utility methods which
> >     avoid infinite loops on overflow, while providing slightly more
> >     efficient code in most cases.
> >
> >     Bug: https://bugs.openjdk.java.net/browse/JDK-8234331
> >     Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
> >
> >     Testing: tier1-3
> >
> >     Thanks!
> >
> >     /Claes
> >
>

From sandhya.viswanathan at intel.com  Wed Dec  4 17:01:52 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Wed, 4 Dec 2019 17:01:52 +0000
Subject: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on
 Xeon for small vector width operations
In-Reply-To: <3C7A0114-AD7B-4032-8D50-0F2D31EB97BC@oracle.com>
References: <BYAPR11MB29831374E2A40DC1FFC3AAEBEF420@BYAPR11MB2983.namprd11.prod.outlook.com>
 <3C7A0114-AD7B-4032-8D50-0F2D31EB97BC@oracle.com>
Message-ID: <BYAPR11MB298350F4D65F91080E5D69ABEF5D0@BYAPR11MB2983.namprd11.prod.outlook.com>

Hi Vladimir,

Could you please sponsor this patch?

Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Tuesday, December 03, 2019 2:41 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on Xeon for small vector width operations

Looks good.

Thanks
Vladimir

> On Dec 3, 2019, at 1:33 PM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
> 
> For vector replicate and reduction operations vinsert and vextract instructions are used.
> When UseAVX level is set to 3, these instructions are unnecessarily encoded with 512-bit vector width.
> Only for KNL platform which doesn't support AVX512 variable length encoding, the 512-bit wide instruction need to be used.
> All other Xeon platforms should use the appropriate 256-bit wide vector instruction.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235288
> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235288/webrev.00/
> 
> Please review and approve.
> 
> Best Regards,
> Sandhya
> 


From claes.redestad at oracle.com  Wed Dec  4 17:37:05 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 4 Dec 2019 18:37:05 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
Message-ID: <39475089-9933-f85a-8927-661bba44780c@oracle.com>

Hi Thomas,

On 2019-12-04 17:32, Thomas St?fe wrote:
> Hi Claes,
> 
> I think this is very good and useful.
> 
> General remarks:
> 
> - Pity that it only works for 32/64bit values. It would be neat to have 
> variants for 8 and 16 bit values too.

I don't think adding 8- and 16-bit variants would be too hard, but would
need specialized adjustment since most platforms only have 32- and 64-
bit intrinsics for clz. Do you know any code where we might need such
specializations...?

> 
> - +1 for a "lumpier" header.

o_O

> 
> - the round_down() variants: would it make sense to allow 0 as valid 
> input? I can imagine scenarios where 0 could be a valid input and result 
> for this operation.

I've been contemplating if there's room for variants that are less
strict, e.g., where you can define what values you should get on
0 or overflow. I think that's out of scope for this RFE.

> 
> - I tested on AIX, builds and gtests run through without errors.

Great, thanks!

> 
> ----
> 
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/runtime/threadSMR.cpp.udiff.html
> 
> + ?int hash_table_size = 
> round_up_power_of_2(MIN2((int)get_java_thread_list()->length(), 32) << 1);
> 
> Style nit, I would have preferred this to be two operations for 
> readability.

Sure, will fix.

> 
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/powerOfTwo.hpp.html
> 
>  ? 43 inline typename EnableIf<IsSigned<T>::value, T>::type 
> round_down_power_of_2(T value) {
> ...
> 
>  ? 47 ? assert(lz < sizeof(T) * BitsPerByte, "Sanity");
> 
> Do we need this if we check the input value for >0 ? (Same question for 
> the signed version of round_up() below).

Technically we shouldn't need any of the "Sanity" asserts that check
that lz is in the expected range ([0-31]/[0-63]). I added them mostly as
scaffolding when writing these functions, but kept them since I think
they help reason about the code. I can remove them if you insist.

> 
>  ? 48 ? assert(lz > 0, "Overflow to negative");
> 
> Same here?

Right, for round_down this one is pointless lz == 0 would only be
possible if value > 0. I'll remove this one.

For round_up we overflow when value > 2^30, so lz == 1, which is not
covered by value > 0. So that assert I think is needed.

> Also, small nit, the assert text is slightly off since no overflow 
> happened yet, but the input value is negative.

Maybe just:

   assert(lz > 1, "Will overflow");

> 
> (The asserts do not bother me much, I am just trying to understand why 
> we need them)

To catch obvious programming mistakes, catch more issues during testing,
and reason about the code.

> 
> -
> 
>  ?100 // Accepts 0 (returns 1); overflows if value is larger than or 
>  ?equal to 2^31
>  ?101 // or 2^63, for 32- and 64-bit integers, respectively
> 
> Comment is a bit off. True only for unsigned values. Also it should 
> mention that overflow results in assert to be in sync with the other 
> comments.

How about:

// Accepts 0 (returns 1), overflows with assert if value is larger than
// or equal to 2^31 (2^30 for signed) or 2^63 (2^62 if signed), for 32-
// and 64-bit integers, respectively

> 
> ----
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html
> 
> template <typename T> inline uint32_t count_leading_zeros(T x) {
> 
> Instead of all the "if sizeof(T) == 4|8", could we specialize the 
> functions for different operand sizes? Example:
> 
> template <typename T, size_t n> struct Impl;
> template <typename T> struct Impl<T, 8> { static int doit(T v) { return 
> __builtin_clzll(v); } };
> template <typename T> struct Impl<T, 4> { static int doit(T v) { return 
> __builtin_clz(v); } };
> template <typename T> int count_leading_zero(T v) { return Impl<T, 
> sizeof(T)>::doit(v); }
> 
> Would make it easier later to plug in implementations for different 
> sizes, and we'd still get a compile time error for invalid input types.

I guess we could, but I'm not so sure it'd help readability.

> 
> -
> 
> I wondered why count_leading_zeros did not allow 0 as input until I saw 
> that __builtin_clz(ll) unfolds to bsr; and that does not work for 
> input=0. I still find it an artificial limitation but that was there 
> before your patch.

Yes, it's a bit unfortunate and artificial, and I think we'd regress
some actually performance-critical places by adding a special-case for
0. If we want graceful versions that give 0 and max on over- and
underflow we can add them in separately, I think.

> 
> ----
> 
> Thank you for the enhanced comment in the fallback implementation, thats 
> helpful.
> 
> ----
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.03/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html
> 
> It bothers me a bit that we do not test invalid input and overflow. 
> There is a way, I believe, to have gtests expect an assert. We have 
> TEST_VM_ASSERT but I am not sure if that is a good idea since crashing 
> takes time. I also see that we only use it at a single place, so I guess 
> others share the same concern :)
> 
> Alternatively, one could have variants of round_up/down which return a 
> definite value on overflow. I wonder whether this may be useful 
> someplace outside the tests.

Yeah, a lighter mechanism for testing asserts would be great. I'd prefer
deferring negative tests to a follow-up, though, since I have a few
other things to attend to before 14 FC... :-)

> 
> -
> 
> Just a thought, but could you cut down the code for the many TEST 
> functions if you would use a templatized test function? Basically, 
> something like
> 
> template <class T> struct MyMaxImpl;
> template <int32_t> MyMaxImpl ? { static int32 get() { return 
> int32_max_pow2; } };
> .. etc
> 
> template <typename T> void round_up_test() {
>  ? EXPECT_EQ(round_up_power_of_2((T)1), (T)1) << "value = " << 1;
> ...
>  ? ? const T max = MyMaxImpl<T>::get();
>  ? EXPECT_EQ(round_up_power_of_2(max - 1), max)
> ...
> }
> 
> TEST(power_of_2, round_up_power_of_2_signed_32) {
>  ? round_up_do_test<int32_t>();
> }
> 
> That way the test would also be easily extendable later if we decide to 
> add support for uint8_t or uint16_t.

We might save a few lines of code, yes, but my preference is keeping
tests explicit even if that makes them somewhat more repetitive - to a
point. I can see if we were to add int8/int16 variants that this might
move well beyond that point, though.

Thanks!

/Claes

From claes.redestad at oracle.com  Thu Dec  5 01:32:55 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 5 Dec 2019 02:32:55 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <39475089-9933-f85a-8927-661bba44780c@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
 <39475089-9933-f85a-8927-661bba44780c@oracle.com>
Message-ID: <b0a25479-3c19-2511-8249-5f65cc07b35b@oracle.com>

Hi again,

On 2019-12-04 18:37, Claes Redestad wrote:
>>
>>
>> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html 
>>
>>
>> template <typename T> inline uint32_t count_leading_zeros(T x) {
>>
>> Instead of all the "if sizeof(T) == 4|8", could we specialize the 
>> functions for different operand sizes? Example:
>>
>> template <typename T, size_t n> struct Impl;
>> template <typename T> struct Impl<T, 8> { static int doit(T v) { 
>> return __builtin_clzll(v); } };
>> template <typename T> struct Impl<T, 4> { static int doit(T v) { 
>> return __builtin_clz(v); } };
>> template <typename T> int count_leading_zero(T v) { return Impl<T, 
>> sizeof(T)>::doit(v); }
>>
>> Would make it easier later to plug in implementations for different 
>> sizes, and we'd still get a compile time error for invalid input types.
> 
> I guess we could, but I'm not so sure it'd help readability.

after some considerations, I took this idea for a spin, along with
implementations and tests for 8 and 16-bit for completeness up
front:

http://cr.openjdk.java.net/~redestad/8234331/open.04

I also turned tests into template functions etc. I think it turned
out an improvement.

Testing: tier1-3 (some ongoing)

Thanks!

/Claes

From tobias.hartmann at oracle.com  Thu Dec  5 08:56:57 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 5 Dec 2019 09:56:57 +0100
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
Message-ID: <f57e7985-79ed-06bb-ddf0-f51733c5e008@oracle.com>

Hi Vladimir,

this looks good to me.

On 29.11.19 16:42, Vladimir Ivanov wrote:
> It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1].

Is the remaining regression due to JDK-8226396?

Best regards,
Tobias

From vladimir.x.ivanov at oracle.com  Thu Dec  5 09:00:15 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 5 Dec 2019 12:00:15 +0300
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <f57e7985-79ed-06bb-ddf0-f51733c5e008@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
 <f57e7985-79ed-06bb-ddf0-f51733c5e008@oracle.com>
Message-ID: <4c2c64dd-e339-1b86-d6ce-639cd9565308@oracle.com>

Thanks, Tobias.

>> It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1].
> 
> Is the remaining regression due to JDK-8226396?

Yes.

Best regards,
Vladimir Ivanov

From tobias.hartmann at oracle.com  Thu Dec  5 09:03:57 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 5 Dec 2019 10:03:57 +0100
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <4c2c64dd-e339-1b86-d6ce-639cd9565308@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
 <f57e7985-79ed-06bb-ddf0-f51733c5e008@oracle.com>
 <4c2c64dd-e339-1b86-d6ce-639cd9565308@oracle.com>
Message-ID: <64515d64-b198-27f0-b1e0-5e87985cb954@oracle.com>


On 05.12.19 10:00, Vladimir Ivanov wrote:
>> Is the remaining regression due to JDK-8226396?
> 
> Yes.

Okay, thanks for confirming.

Best regards,
Tobias

From vladimir.x.ivanov at oracle.com  Thu Dec  5 09:53:28 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 5 Dec 2019 12:53:28 +0300
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
Message-ID: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8234392/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8234392

Make basic type available in Matcher::match_rule_supported_vector() and 
refactor Matcher::match_rule_supported_vector() on x86.

It enables significant simplification of 
Matcher::match_rule_supported_vector()on x86 by using 
Matcher::vector_size_supported() to cover cases like AXV2 is required 
for 256-bit operaions on integral (see Matcher::vector_width_in_bytes() 
[1] for details).

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Testing: tier1-4

Best regards,
Vladimir Ivanov

[1] 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1442

From thomas.stuefe at gmail.com  Thu Dec  5 10:25:38 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 5 Dec 2019 11:25:38 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <39475089-9933-f85a-8927-661bba44780c@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
 <39475089-9933-f85a-8927-661bba44780c@oracle.com>
Message-ID: <CAA-vtUyTqZHY5Pw6tobcJOyE45NS5yb98QpHkJXj_H9iHUL42A@mail.gmail.com>

Hi Claes,

answers to your remarks inline. I will post the review for v0.4 separately.

On Wed, Dec 4, 2019 at 6:36 PM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi Thomas,
>
> On 2019-12-04 17:32, Thomas St?fe wrote:
> > Hi Claes,
> >
> > I think this is very good and useful.
> >
> > General remarks:
> >
> > - Pity that it only works for 32/64bit values. It would be neat to have
> > variants for 8 and 16 bit values too.
>
> I don't think adding 8- and 16-bit variants would be too hard, but would
> need specialized adjustment since most platforms only have 32- and 64-
> bit intrinsics for clz. Do you know any code where we might need such
> specializations...?
>
>
Not offhand, no. Just an urge to completeness.


> >
> > - +1 for a "lumpier" header.
>
> o_O
>
>
Referring to Johns earlier mail about lumping the utility functions
together in a single header instead of having micro headers for every
function.

>
> > - the round_down() variants: would it make sense to allow 0 as valid
> > input? I can imagine scenarios where 0 could be a valid input and result
> > for this operation.
>
> I've been contemplating if there's room for variants that are less
> strict, e.g., where you can define what values you should get on
> 0 or overflow. I think that's out of scope for this RFE.
>
> >
> > - I tested on AIX, builds and gtests run through without errors.
>
> Great, thanks!
>
> >
> > ----
> >
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/runtime/threadSMR.cpp.udiff.html
> >
> > +  int hash_table_size =
> > round_up_power_of_2(MIN2((int)get_java_thread_list()->length(), 32) <<
> 1);
> >
> > Style nit, I would have preferred this to be two operations for
> > readability.
>
> Sure, will fix.
>
> >
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/powerOfTwo.hpp.html
> >
> >    43 inline typename EnableIf<IsSigned<T>::value, T>::type
> > round_down_power_of_2(T value) {
> > ...
> >
> >    47   assert(lz < sizeof(T) * BitsPerByte, "Sanity");
> >
> > Do we need this if we check the input value for >0 ? (Same question for
> > the signed version of round_up() below).
>
> Technically we shouldn't need any of the "Sanity" asserts that check
> that lz is in the expected range ([0-31]/[0-63]). I added them mostly as
> scaffolding when writing these functions, but kept them since I think
> they help reason about the code. I can remove them if you insist.
>
>
No, not necessary. Just wondered if I missed something.


> >
> >    48   assert(lz > 0, "Overflow to negative");
> >
> > Same here?
>
> Right, for round_down this one is pointless lz == 0 would only be
> possible if value > 0. I'll remove this one.
>
> For round_up we overflow when value > 2^30, so lz == 1, which is not
> covered by value > 0. So that assert I think is needed.
>
>
Oh, right.


> > Also, small nit, the assert text is slightly off since no overflow
> > happened yet, but the input value is negative.
>
> Maybe just:
>
>    assert(lz > 1, "Will overflow");
>

Thanks.


>
> >
> > (The asserts do not bother me much, I am just trying to understand why
> > we need them)
>
> To catch obvious programming mistakes, catch more issues during testing,
> and reason about the code.
>
> >
> > -
> >
> >   100 // Accepts 0 (returns 1); overflows if value is larger than or
> >   equal to 2^31
> >   101 // or 2^63, for 32- and 64-bit integers, respectively
> >
> > Comment is a bit off. True only for unsigned values. Also it should
> > mention that overflow results in assert to be in sync with the other
> > comments.
>
> How about:
>
> // Accepts 0 (returns 1), overflows with assert if value is larger than
> // or equal to 2^31 (2^30 for signed) or 2^63 (2^62 if signed), for 32-
> // and 64-bit integers, respectively
>

Sounds good thank you.


>
> >
> > ----
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html
> >
> > template <typename T> inline uint32_t count_leading_zeros(T x) {
> >
> > Instead of all the "if sizeof(T) == 4|8", could we specialize the
> > functions for different operand sizes? Example:
> >
> > template <typename T, size_t n> struct Impl;
> > template <typename T> struct Impl<T, 8> { static int doit(T v) { return
> > __builtin_clzll(v); } };
> > template <typename T> struct Impl<T, 4> { static int doit(T v) { return
> > __builtin_clz(v); } };
> > template <typename T> int count_leading_zero(T v) { return Impl<T,
> > sizeof(T)>::doit(v); }
> >
> > Would make it easier later to plug in implementations for different
> > sizes, and we'd still get a compile time error for invalid input types.
>
> I guess we could, but I'm not so sure it'd help readability.
>
> >
> > -
> >
> > I wondered why count_leading_zeros did not allow 0 as input until I saw
> > that __builtin_clz(ll) unfolds to bsr; and that does not work for
> > input=0. I still find it an artificial limitation but that was there
> > before your patch.
>
> Yes, it's a bit unfortunate and artificial, and I think we'd regress
> some actually performance-critical places by adding a special-case for
> 0. If we want graceful versions that give 0 and max on over- and
> underflow we can add them in separately, I think.
>
> >
> > ----
> >
> > Thank you for the enhanced comment in the fallback implementation, thats
> > helpful.
> >
> > ----
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.03/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html
> >
> > It bothers me a bit that we do not test invalid input and overflow.
> > There is a way, I believe, to have gtests expect an assert. We have
> > TEST_VM_ASSERT but I am not sure if that is a good idea since crashing
> > takes time. I also see that we only use it at a single place, so I guess
> > others share the same concern :)
> >
> > Alternatively, one could have variants of round_up/down which return a
> > definite value on overflow. I wonder whether this may be useful
> > someplace outside the tests.
>
> Yeah, a lighter mechanism for testing asserts would be great. I'd prefer
> deferring negative tests to a follow-up, though, since I have a few
> other things to attend to before 14 FC... :-)
>
> >
> > -
> >
> > Just a thought, but could you cut down the code for the many TEST
> > functions if you would use a templatized test function? Basically,
> > something like
> >
> > template <class T> struct MyMaxImpl;
> > template <int32_t> MyMaxImpl   { static int32 get() { return
> > int32_max_pow2; } };
> > .. etc
> >
> > template <typename T> void round_up_test() {
> >    EXPECT_EQ(round_up_power_of_2((T)1), (T)1) << "value = " << 1;
> > ...
> >      const T max = MyMaxImpl<T>::get();
> >    EXPECT_EQ(round_up_power_of_2(max - 1), max)
> > ...
> > }
> >
> > TEST(power_of_2, round_up_power_of_2_signed_32) {
> >    round_up_do_test<int32_t>();
> > }
> >
> > That way the test would also be easily extendable later if we decide to
> > add support for uint8_t or uint16_t.
>
> We might save a few lines of code, yes, but my preference is keeping
> tests explicit even if that makes them somewhat more repetitive - to a
> point. I can see if we were to add int8/int16 variants that this might
> move well beyond that point, though.
>
> Thanks!
>
> /Claes
>

Cheers, Thomas

From thomas.stuefe at gmail.com  Thu Dec  5 10:26:59 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 5 Dec 2019 11:26:59 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <b0a25479-3c19-2511-8249-5f65cc07b35b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
 <39475089-9933-f85a-8927-661bba44780c@oracle.com>
 <b0a25479-3c19-2511-8249-5f65cc07b35b@oracle.com>
Message-ID: <CAA-vtUwuHi4D+tR-jFtmE6+4ZFq_M4RsQWp5ohYK6=VyAgj6bQ@mail.gmail.com>

Hi Claes,

this looks nice, thank you for your work. Only minor stuff remains, mostly
matters of taste. Feel free to ignore them. The patch is already fine as it
is for me.

With your latest patch, AIX still works.

---

http://cr.openjdk.java.net/~redestad/8234331/open.04/src/hotspot/share/utilities/count_leading_zeros.hpp.udiff.html

Taste matter: I find the return type for count_leading_zeros as uint32_t
oddly specific. Instinctively I would have choosen int or unsigned. The gcc
buildins all return int.

---

Question, why do we treat signed numbers of 8 and 16 bits different wrt to
values < 0?

if you leave behavior as it is, could you please adjust the comment:

 // Return the number of leading zeros in x, e.g. the zero-based index
 // of the most significant set bit in x.  Undefined for 0.
+ // For signed numbers whose size is < 32bit, function will
+ // return 0 for values < 0.

----

Taste matter: Windows, 64bit version for x86: You could call your own
count_leading_zeros<uint32_t>, instead of calling _BitScanReverse.

----

http://cr.openjdk.java.net/~redestad/8234331/open.04/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html

In the test for next_up, can we have, for completeness sake, a test which
tests that with any valid input pow2 the result is the next pow2?

--- a/test/hotspot/gtest/utilities/test_powerOfTwo.cpp  Thu Dec 05 02:28:58
2019 +0100
+++ b/test/hotspot/gtest/utilities/test_powerOfTwo.cpp  Thu Dec 05 11:08:47
2019 +0100
@@ -139,6 +139,11 @@
     EXPECT_EQ(pow2, next_power_of_2(pow2 - 1))
       << "value = " << pow2 - 1;
   }
+  // next(pow2) should return pow2 * 2
+  for (T pow2 = T(1); pow2 < t_max_pow2 / 2; pow2 = pow2 * 2) {
+    EXPECT_EQ(pow2 * 2, next_power_of_2(pow2))
+      << "value = " << pow2;
+  }

---

Thanks, Thomas

On Thu, Dec 5, 2019 at 2:32 AM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi again,
>
> On 2019-12-04 18:37, Claes Redestad wrote:
> >>
> >>
> >>
> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html
> >>
> >>
> >> template <typename T> inline uint32_t count_leading_zeros(T x) {
> >>
> >> Instead of all the "if sizeof(T) == 4|8", could we specialize the
> >> functions for different operand sizes? Example:
> >>
> >> template <typename T, size_t n> struct Impl;
> >> template <typename T> struct Impl<T, 8> { static int doit(T v) {
> >> return __builtin_clzll(v); } };
> >> template <typename T> struct Impl<T, 4> { static int doit(T v) {
> >> return __builtin_clz(v); } };
> >> template <typename T> int count_leading_zero(T v) { return Impl<T,
> >> sizeof(T)>::doit(v); }
> >>
> >> Would make it easier later to plug in implementations for different
> >> sizes, and we'd still get a compile time error for invalid input types.
> >
> > I guess we could, but I'm not so sure it'd help readability.
>
> after some considerations, I took this idea for a spin, along with
> implementations and tests for 8 and 16-bit for completeness up
> front:
>
> http://cr.openjdk.java.net/~redestad/8234331/open.04
>
> I also turned tests into template functions etc. I think it turned
> out an improvement.
>
> Testing: tier1-3 (some ongoing)
>
> Thanks!
>
> /Claes
>

From vladimir.x.ivanov at oracle.com  Thu Dec  5 12:01:21 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 5 Dec 2019 15:01:21 +0300
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector
 operations
Message-ID: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all
https://bugs.openjdk.java.net/browse/JDK-8235405

Reduce the number of AD instructions needed to implement vector 
operations by merging existing ones. The patch covers the following 
operations:
   - LoadVector
   - StoreVector
   - RoundDoubleModeV
   - AndV
   - OrV
   - XorV
   - MulAddVS2VI
   - PopCountVI

Indiviual patches:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual

As Jatin described, merging is applied only to AD instructions of 
similar shape. There are some more opportunities for reduction/merging 
left, but they are deliberately left out for future work.

The patch is derived from the inintial version of generic vector support 
[1]. Generic vector support was reviewed earlier and the other parts of 
refactorings in x86.ad  will be posted for review separately (7 more 
patches pending).

Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html

From vladimir.x.ivanov at oracle.com  Thu Dec  5 12:43:39 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 5 Dec 2019 15:43:39 +0300
Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
 Thread::currentThread() intrinsic
Message-ID: <d66934e4-5a8b-ac14-ddca-1d557846d4ee@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8235143

Thread::currentThread() intrinsic doesn't need memory state:
though multiple threads can execute same code, "current thread" can't 
change in the context of a single method activation. So, once it is 
observed, it's safe to share among all users.

One of the use cases which benefit a lot from such optimization is 
ownership checks for thread confined resources (fast path check for 
owner thread to avoid heavy-weight synchronization).

The patch was part of foreign-memaccess branch in Project Panama and 
showed good performance results on Memory Access API implementation [1].

Testing: tier1-4

PS: the optimization should be disabled in Project Loom: the assumption 
doesn't hold for continuations (in their current form).

Best regards,
Vladimir Ivanov

[1] https://openjdk.java.net/jeps/370
     JEP 370: Foreign-Memory Access API (Incubator)

From christian.hagedorn at oracle.com  Thu Dec  5 13:22:16 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 5 Dec 2019 14:22:16 +0100
Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in
 get_early_ctrl_for_expensive
Message-ID: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com>

Hi

Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8229994
http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/

As a part of loop peeling, loop-invariant dominating tests are moved out 
of the loop [1]. Doing so requires to change all control inputs coming 
from an independent test 'test' to data nodes inside the loop body to 
use the peeled version 'peeled_test' of 'test' instead [2]: the control 
test->data will be changed to peeled_test->data. If a data node is 
expensive then an assertion [3] checks the dominance relation of 'test' 
and 'peeled_test' ('peeled_test' should dominate 'test') but fails to 
find 'peeled_test' as part of the idom chain starting from the previous 
control input 'test' of that expensive node.

The reason is that the idom information was not correctly set before at 
[4] after creating the peeled nodes. If 'head' (a CountedLoop node) has 
another OuterStripMinedLoop node, let's say 'outer_head', as an input 
then 'outer_head' is set as idom of 'head' instead of setting the idom 
of 'outer_head' to the new loop entry from the peeled iteration. We then 
miss all the idom information of the peeled iteration and the idom of 
'outer_head' still points to the old loop entry node before peeling.

The fix is straight forward to also account for loop strip mined loops 
when correcting the idom to the new loop entry from the peeled iteration 
at [4].

Thank you!

Best regards,
Christian


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669
[2] 
http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271
[3] 
http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153
[4] 
http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658

From vladimir.x.ivanov at oracle.com  Thu Dec  5 13:31:23 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 5 Dec 2019 16:31:23 +0300
Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in
 get_early_ctrl_for_expensive
In-Reply-To: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com>
References: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com>
Message-ID: <16ea8963-0403-0365-d105-b7e6bf7b90a6@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 05.12.2019 16:22, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8229994
> http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/
> 
> As a part of loop peeling, loop-invariant dominating tests are moved out 
> of the loop [1]. Doing so requires to change all control inputs coming 
> from an independent test 'test' to data nodes inside the loop body to 
> use the peeled version 'peeled_test' of 'test' instead [2]: the control 
> test->data will be changed to peeled_test->data. If a data node is 
> expensive then an assertion [3] checks the dominance relation of 'test' 
> and 'peeled_test' ('peeled_test' should dominate 'test') but fails to 
> find 'peeled_test' as part of the idom chain starting from the previous 
> control input 'test' of that expensive node.
> 
> The reason is that the idom information was not correctly set before at 
> [4] after creating the peeled nodes. If 'head' (a CountedLoop node) has 
> another OuterStripMinedLoop node, let's say 'outer_head', as an input 
> then 'outer_head' is set as idom of 'head' instead of setting the idom 
> of 'outer_head' to the new loop entry from the peeled iteration. We then 
> miss all the idom information of the peeled iteration and the idom of 
> 'outer_head' still points to the old loop entry node before peeling.
> 
> The fix is straight forward to also account for loop strip mined loops 
> when correcting the idom to the new loop entry from the peeled iteration 
> at [4].
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] 
> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669 
> 
> [2] 
> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271 
> 
> [3] 
> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153 
> 
> [4] 
> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658 
> 

From christian.hagedorn at oracle.com  Thu Dec  5 13:39:56 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 5 Dec 2019 14:39:56 +0100
Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in
 get_early_ctrl_for_expensive
In-Reply-To: <16ea8963-0403-0365-d105-b7e6bf7b90a6@oracle.com>
References: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com>
 <16ea8963-0403-0365-d105-b7e6bf7b90a6@oracle.com>
Message-ID: <cb52c829-53d6-ac3b-34b2-21dfa5caf319@oracle.com>

Thank you for your review Vladimir!

Best regards,
Christian

On 05.12.19 14:31, Vladimir Ivanov wrote:
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 05.12.2019 16:22, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8229994
>> http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/
>>
>> As a part of loop peeling, loop-invariant dominating tests are moved 
>> out of the loop [1]. Doing so requires to change all control inputs 
>> coming from an independent test 'test' to data nodes inside the loop 
>> body to use the peeled version 'peeled_test' of 'test' instead [2]: 
>> the control test->data will be changed to peeled_test->data. If a data 
>> node is expensive then an assertion [3] checks the dominance relation 
>> of 'test' and 'peeled_test' ('peeled_test' should dominate 'test') but 
>> fails to find 'peeled_test' as part of the idom chain starting from 
>> the previous control input 'test' of that expensive node.
>>
>> The reason is that the idom information was not correctly set before 
>> at [4] after creating the peeled nodes. If 'head' (a CountedLoop node) 
>> has another OuterStripMinedLoop node, let's say 'outer_head', as an 
>> input then 'outer_head' is set as idom of 'head' instead of setting 
>> the idom of 'outer_head' to the new loop entry from the peeled 
>> iteration. We then miss all the idom information of the peeled 
>> iteration and the idom of 'outer_head' still points to the old loop 
>> entry node before peeling.
>>
>> The fix is straight forward to also account for loop strip mined loops 
>> when correcting the idom to the new loop entry from the peeled 
>> iteration at [4].
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1] 
>> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669 
>>
>> [2] 
>> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271 
>>
>> [3] 
>> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153 
>>
>> [4] 
>> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658 
>>

From christian.hagedorn at oracle.com  Thu Dec  5 14:39:12 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 5 Dec 2019 15:39:12 +0100
Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be
Message-ID: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com>

Hi

Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8233032
http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/

The problem of the bug can be traced back to finding the first and last 
memory state of a load when processing a load pack in 
SuperWord::co_locate_pack() [1]. One load of a load pack in the test 
case is dependent on the store node 's' which is sandwiched between 
other store nodes 'p1' and 'p2' of a store pack (p1 -> s -> p2). The 
other load of the load pack is dependent on the store node 'p2'. The 
sandwiched store node 's' swaps positions with 'p2' to move it out of 
the pack dependencies: p1 -> p2 -> s. However, the bb indices are not 
updated (bb_idx(s) < bb_idx(p2) is still true). Therefore, it sets 
last_mem to the memory state of the first load in the loop at [1]. As a 
result, the graph walk at [2] always starts following the input of the 
memory state of the first load (which should actually be the one of the 
last load) and will move beyond a loop phi as the stop condition is 
never met for a node having another memory state than the first one of 
the load pack. Eventually a bb index for a node outside of the loop is 
read resulting in the seen assertion failure.

I suggest to use a solution to look up the first and last memory state 
that does not use bb indices as it probably is quite difficult to update 
them properly when moving sandwiched stores around. I've seen that 
Roland once proposed such a solution [3][4] which he then replaced by a 
version using bb indices (causing this bug). I adopted Roland's original 
patch into mine and propose to use it as a fix.

Thank you!

Best regards,
Christian


[1] 
https://hg.openjdk.java.net/jdk/jdk/file/5defda391e18/src/hotspot/share/opto/superword.cpp#l2258
[2] 
https://hg.openjdk.java.net/jdk/jdk/file/5defda391e18/src/hotspot/share/opto/superword.cpp#l2276
[3] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-April/028702.html
[4] http://cr.openjdk.java.net/~roland/8201367/webrev.00/


From claes.redestad at oracle.com  Thu Dec  5 14:59:16 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 5 Dec 2019 15:59:16 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <CAA-vtUwuHi4D+tR-jFtmE6+4ZFq_M4RsQWp5ohYK6=VyAgj6bQ@mail.gmail.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
 <39475089-9933-f85a-8927-661bba44780c@oracle.com>
 <b0a25479-3c19-2511-8249-5f65cc07b35b@oracle.com>
 <CAA-vtUwuHi4D+tR-jFtmE6+4ZFq_M4RsQWp5ohYK6=VyAgj6bQ@mail.gmail.com>
Message-ID: <f0754590-badf-7c19-cf17-b14b5769c70d@oracle.com>

On 2019-12-05 11:26, Thomas St?fe wrote:
> Hi Claes,
> 
> this looks nice, thank you for your work. Only minor stuff remains, 
> mostly matters of taste. Feel free to ignore them. The patch is already 
> fine as it is for me.
> 
> With your latest patch, AIX still works.

Great!

> 
> ---
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.04/src/hotspot/share/utilities/count_leading_zeros.hpp.udiff.html
> 
> Taste matter: I find the return type for count_leading_zeros as uint32_t 
> oddly specific. Instinctively I would have choosen int or unsigned. The 
> gcc buildins all return int.

No particular reason.

> 
> ---
> 
> Question, why do we treat signed numbers of 8 and 16 bits different wrt 
> to values < 0?
> 
> if you leave behavior as it is, could you please adjust the comment:
> 
>  ?// Return the number of leading zeros in x, e.g. the zero-based index
>  ?// of the most significant set bit in x.? Undefined for 0.
> + // For signed numbers whose size is < 32bit, function will
> + // return 0 for values < 0.

Such a comment is a bit redundant since count_leading_zeros will return
0 for any negative input. Maybe we should spell that out more clearly,
though.

The special cases avoided issues with sign extension when upcasting,
e.g, casting int8_t -1 to unsigned returns 0xFFFFFFFF, so
__builtin_clz((int8_t)-1) returns 0, which then messes up the manual
adjustment. But this can be further simplified by masking, avoiding
an extra branch:

template <typename T> struct CountLeadingZerosImpl<T, 1> {
   static uint32_t doit(T v) {
     return __builtin_clz((uint32_t)v & 0xFF) - 24u;
   }
};

template <typename T> struct CountLeadingZerosImpl<T, 2> {
   static uint32_t doit(T v) {
     return __builtin_clz((uint32_t)v & 0xFFFF) - 16u;
   }
};

> 
> ----
> 
> Taste matter: Windows, 64bit version for x86: You could call your own 
> count_leading_zeros<uint32_t>, instead of calling _BitScanReverse.

Sure.

> 
> ----
> 
> http://cr.openjdk.java.net/~redestad/8234331/open.04/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html
> 
> In the test for next_up, can we have, for completeness sake, a test 
> which tests that with any valid input pow2 the result is the next pow2?
> 
> --- a/test/hotspot/gtest/utilities/test_powerOfTwo.cpp ?Thu Dec 05 
> 02:28:58 2019 +0100
> +++ b/test/hotspot/gtest/utilities/test_powerOfTwo.cpp ?Thu Dec 05 
> 11:08:47 2019 +0100
> @@ -139,6 +139,11 @@
>  ? ? ?EXPECT_EQ(pow2, next_power_of_2(pow2 - 1))
>  ? ? ? ?<< "value = " << pow2 - 1;
>  ? ?}
> + ?// next(pow2) should return pow2 * 2
> + ?for (T pow2 = T(1); pow2 < t_max_pow2 / 2; pow2 = pow2 * 2) {
> + ? ?EXPECT_EQ(pow2 * 2, next_power_of_2(pow2))
> + ? ? ?<< "value = " << pow2;
> + ?}

Fixed.

I also ran into an issue with use of std::numeric_limit<T>::max()
causing obscure issues on some platforms at some call sites (*cough*
Solaris), so I replaced those usages with a portable (but not very
efficient) version of that in place for the few assert/test uses:

http://cr.openjdk.java.net/~redestad/8234331/open.05

Testing: tier1-3, some ongoing

/Claes

From tobias.hartmann at oracle.com  Thu Dec  5 15:16:51 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 5 Dec 2019 16:16:51 +0100
Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in
 get_early_ctrl_for_expensive
In-Reply-To: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com>
References: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com>
Message-ID: <e2543c4c-e5d7-2c22-79a9-ebbefcdf4798@oracle.com>

Hi Christian,

looks good to me but the CompileCommand argument in the @run statement of the test should come
before the test class name (no new webrev required).

Best regards,
Tobias

On 05.12.19 14:22, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8229994
> http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/
> 
> As a part of loop peeling, loop-invariant dominating tests are moved out of the loop [1]. Doing so
> requires to change all control inputs coming from an independent test 'test' to data nodes inside
> the loop body to use the peeled version 'peeled_test' of 'test' instead [2]: the control test->data
> will be changed to peeled_test->data. If a data node is expensive then an assertion [3] checks the
> dominance relation of 'test' and 'peeled_test' ('peeled_test' should dominate 'test') but fails to
> find 'peeled_test' as part of the idom chain starting from the previous control input 'test' of that
> expensive node.
> 
> The reason is that the idom information was not correctly set before at [4] after creating the
> peeled nodes. If 'head' (a CountedLoop node) has another OuterStripMinedLoop node, let's say
> 'outer_head', as an input then 'outer_head' is set as idom of 'head' instead of setting the idom of
> 'outer_head' to the new loop entry from the peeled iteration. We then miss all the idom information
> of the peeled iteration and the idom of 'outer_head' still points to the old loop entry node before
> peeling.
> 
> The fix is straight forward to also account for loop strip mined loops when correcting the idom to
> the new loop entry from the peeled iteration at [4].
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669
> [2] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271
> [3] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153
> [4] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658

From martin.doerr at sap.com  Thu Dec  5 16:13:00 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 5 Dec 2019 16:13:00 +0000
Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
 Thread::currentThread() intrinsic
In-Reply-To: <d66934e4-5a8b-ac14-ddca-1d557846d4ee@oracle.com>
References: <d66934e4-5a8b-ac14-ddca-1d557846d4ee@oracle.com>
Message-ID: <AM0PR0202MB3297361B7248F5FEA64AEC8C9A5C0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi Vladimir,

looks good to me.

The j.l.Thread Oop for Thread::current() is allowed to live in a register with this change.
GC will find it via OopMap at safepoints. So I think it's correct.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Vladimir Ivanov
> Sent: Donnerstag, 5. Dezember 2019 13:44
> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
> Thread::currentThread() intrinsic
> 
> http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8235143
> 
> Thread::currentThread() intrinsic doesn't need memory state:
> though multiple threads can execute same code, "current thread" can't
> change in the context of a single method activation. So, once it is
> observed, it's safe to share among all users.
> 
> One of the use cases which benefit a lot from such optimization is
> ownership checks for thread confined resources (fast path check for
> owner thread to avoid heavy-weight synchronization).
> 
> The patch was part of foreign-memaccess branch in Project Panama and
> showed good performance results on Memory Access API implementation
> [1].
> 
> Testing: tier1-4
> 
> PS: the optimization should be disabled in Project Loom: the assumption
> doesn't hold for continuations (in their current form).
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://openjdk.java.net/jeps/370
>      JEP 370: Foreign-Memory Access API (Incubator)

From thomas.stuefe at gmail.com  Thu Dec  5 17:35:40 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 5 Dec 2019 18:35:40 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <f0754590-badf-7c19-cf17-b14b5769c70d@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
 <39475089-9933-f85a-8927-661bba44780c@oracle.com>
 <b0a25479-3c19-2511-8249-5f65cc07b35b@oracle.com>
 <CAA-vtUwuHi4D+tR-jFtmE6+4ZFq_M4RsQWp5ohYK6=VyAgj6bQ@mail.gmail.com>
 <f0754590-badf-7c19-cf17-b14b5769c70d@oracle.com>
Message-ID: <CAA-vtUwJtj+b_9daboh+xkuh8md25dESyZSkBFWHoGkoA_LBfg@mail.gmail.com>

Hi Claes,

this looks all good to me. Thanks for your patience!

I won't be able to test the latest iteration on AIX (just started vacation)
but I am quite sure it will work since the delta to the last version is
minimal; should it not we will fix it.

Cheers, Thomas

On Thu, Dec 5, 2019 at 3:56 PM Claes Redestad <claes.redestad at oracle.com>
wrote:

> On 2019-12-05 11:26, Thomas St?fe wrote:
> > Hi Claes,
> >
> > this looks nice, thank you for your work. Only minor stuff remains,
> > mostly matters of taste. Feel free to ignore them. The patch is already
> > fine as it is for me.
> >
> > With your latest patch, AIX still works.
>
> Great!
>
> >
> > ---
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.04/src/hotspot/share/utilities/count_leading_zeros.hpp.udiff.html
> >
> > Taste matter: I find the return type for count_leading_zeros as uint32_t
> > oddly specific. Instinctively I would have choosen int or unsigned. The
> > gcc buildins all return int.
>
> No particular reason.
>
> >
> > ---
> >
> > Question, why do we treat signed numbers of 8 and 16 bits different wrt
> > to values < 0?
> >
> > if you leave behavior as it is, could you please adjust the comment:
> >
> >   // Return the number of leading zeros in x, e.g. the zero-based index
> >   // of the most significant set bit in x.  Undefined for 0.
> > + // For signed numbers whose size is < 32bit, function will
> > + // return 0 for values < 0.
>
> Such a comment is a bit redundant since count_leading_zeros will return
> 0 for any negative input. Maybe we should spell that out more clearly,
> though.
>
> The special cases avoided issues with sign extension when upcasting,
> e.g, casting int8_t -1 to unsigned returns 0xFFFFFFFF, so
> __builtin_clz((int8_t)-1) returns 0, which then messes up the manual
> adjustment. But this can be further simplified by masking, avoiding
> an extra branch:
>
> template <typename T> struct CountLeadingZerosImpl<T, 1> {
>    static uint32_t doit(T v) {
>      return __builtin_clz((uint32_t)v & 0xFF) - 24u;
>    }
> };
>
> template <typename T> struct CountLeadingZerosImpl<T, 2> {
>    static uint32_t doit(T v) {
>      return __builtin_clz((uint32_t)v & 0xFFFF) - 16u;
>    }
> };
>
> >
> > ----
> >
> > Taste matter: Windows, 64bit version for x86: You could call your own
> > count_leading_zeros<uint32_t>, instead of calling _BitScanReverse.
>
> Sure.
>
> >
> > ----
> >
> >
> http://cr.openjdk.java.net/~redestad/8234331/open.04/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html
> >
> > In the test for next_up, can we have, for completeness sake, a test
> > which tests that with any valid input pow2 the result is the next pow2?
> >
> > --- a/test/hotspot/gtest/utilities/test_powerOfTwo.cpp  Thu Dec 05
> > 02:28:58 2019 +0100
> > +++ b/test/hotspot/gtest/utilities/test_powerOfTwo.cpp  Thu Dec 05
> > 11:08:47 2019 +0100
> > @@ -139,6 +139,11 @@
> >       EXPECT_EQ(pow2, next_power_of_2(pow2 - 1))
> >         << "value = " << pow2 - 1;
> >     }
> > +  // next(pow2) should return pow2 * 2
> > +  for (T pow2 = T(1); pow2 < t_max_pow2 / 2; pow2 = pow2 * 2) {
> > +    EXPECT_EQ(pow2 * 2, next_power_of_2(pow2))
> > +      << "value = " << pow2;
> > +  }
>
> Fixed.
>
> I also ran into an issue with use of std::numeric_limit<T>::max()
> causing obscure issues on some platforms at some call sites (*cough*
> Solaris), so I replaced those usages with a portable (but not very
> efficient) version of that in place for the few assert/test uses:
>
> http://cr.openjdk.java.net/~redestad/8234331/open.05
>
> Testing: tier1-3, some ongoing
>
> /Claes
>

From erik.osterlund at oracle.com  Thu Dec  5 17:49:50 2019
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Thu, 5 Dec 2019 18:49:50 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <f0754590-badf-7c19-cf17-b14b5769c70d@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
 <39475089-9933-f85a-8927-661bba44780c@oracle.com>
 <b0a25479-3c19-2511-8249-5f65cc07b35b@oracle.com>
 <CAA-vtUwuHi4D+tR-jFtmE6+4ZFq_M4RsQWp5ohYK6=VyAgj6bQ@mail.gmail.com>
 <f0754590-badf-7c19-cf17-b14b5769c70d@oracle.com>
Message-ID: <fc6c93b9-d109-0e55-161b-c8e36b1fad65@oracle.com>

Hi Claes,

I wonder what issue you ran into with std::numeric_limit on Solaris. I 
would personally like to understand that
before dismissing its use and rolling our own function instead.
Note that std::numeric_limit does not accept CV qualified types - 
perhaps that is what you ran into.

Perhaps if you pass in RemoveCv<T>::type instead of T to that trait, 
things will work properly.
If you really want to note use numeric_limit, then you might have to 
stick in some more sanity checks in
that functions, such as checking that the input is integral, etc. And 
dynamically calculating the limit
does seem unfortunate. In that case I'd rather we rolled our own numeric 
limit trait, but I would be
surprised if that is indeed necessary.

Other than that, I would just like to say I approve of having this stuff 
in a separate file, as it includes
a whole bunch of metaprogramming utilities, which I would not like to be 
added from the #include "allTheRandomStuff"
header of our JVM, that is globalDefinitions.hpp.

Other than that, I think this looks good. Oh - not sure but I have a gut 
feeling you can remove a few includes
for zUtils that were only previously used to get to the power of two 
utility, which is no longer necessary.

Thanks,
/Erik

On 2019-12-05 15:59, Claes Redestad wrote:
> On 2019-12-05 11:26, Thomas St?fe wrote:
>> Hi Claes,
>>
>> this looks nice, thank you for your work. Only minor stuff remains, 
>> mostly matters of taste. Feel free to ignore them. The patch is 
>> already fine as it is for me.
>>
>> With your latest patch, AIX still works.
>
> Great!
>
>>
>> ---
>>
>> http://cr.openjdk.java.net/~redestad/8234331/open.04/src/hotspot/share/utilities/count_leading_zeros.hpp.udiff.html 
>>
>>
>> Taste matter: I find the return type for count_leading_zeros as 
>> uint32_t oddly specific. Instinctively I would have choosen int or 
>> unsigned. The gcc buildins all return int.
>
> No particular reason.
>
>>
>> ---
>>
>> Question, why do we treat signed numbers of 8 and 16 bits different 
>> wrt to values < 0?
>>
>> if you leave behavior as it is, could you please adjust the comment:
>>
>> ??// Return the number of leading zeros in x, e.g. the zero-based index
>> ??// of the most significant set bit in x.? Undefined for 0.
>> + // For signed numbers whose size is < 32bit, function will
>> + // return 0 for values < 0.
>
> Such a comment is a bit redundant since count_leading_zeros will return
> 0 for any negative input. Maybe we should spell that out more clearly,
> though.
>
> The special cases avoided issues with sign extension when upcasting,
> e.g, casting int8_t -1 to unsigned returns 0xFFFFFFFF, so
> __builtin_clz((int8_t)-1) returns 0, which then messes up the manual
> adjustment. But this can be further simplified by masking, avoiding
> an extra branch:
>
> template <typename T> struct CountLeadingZerosImpl<T, 1> {
> ? static uint32_t doit(T v) {
> ??? return __builtin_clz((uint32_t)v & 0xFF) - 24u;
> ? }
> };
>
> template <typename T> struct CountLeadingZerosImpl<T, 2> {
> ? static uint32_t doit(T v) {
> ??? return __builtin_clz((uint32_t)v & 0xFFFF) - 16u;
> ? }
> };
>
>>
>> ----
>>
>> Taste matter: Windows, 64bit version for x86: You could call your own 
>> count_leading_zeros<uint32_t>, instead of calling _BitScanReverse.
>
> Sure.
>
>>
>> ----
>>
>> http://cr.openjdk.java.net/~redestad/8234331/open.04/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html 
>>
>>
>> In the test for next_up, can we have, for completeness sake, a test 
>> which tests that with any valid input pow2 the result is the next pow2?
>>
>> --- a/test/hotspot/gtest/utilities/test_powerOfTwo.cpp ?Thu Dec 05 
>> 02:28:58 2019 +0100
>> +++ b/test/hotspot/gtest/utilities/test_powerOfTwo.cpp ?Thu Dec 05 
>> 11:08:47 2019 +0100
>> @@ -139,6 +139,11 @@
>> ?? ? ?EXPECT_EQ(pow2, next_power_of_2(pow2 - 1))
>> ?? ? ? ?<< "value = " << pow2 - 1;
>> ?? ?}
>> + ?// next(pow2) should return pow2 * 2
>> + ?for (T pow2 = T(1); pow2 < t_max_pow2 / 2; pow2 = pow2 * 2) {
>> + ? ?EXPECT_EQ(pow2 * 2, next_power_of_2(pow2))
>> + ? ? ?<< "value = " << pow2;
>> + ?}
>
> Fixed.
>
> I also ran into an issue with use of std::numeric_limit<T>::max()
> causing obscure issues on some platforms at some call sites (*cough*
> Solaris), so I replaced those usages with a portable (but not very
> efficient) version of that in place for the few assert/test uses:
>
> http://cr.openjdk.java.net/~redestad/8234331/open.05
>
> Testing: tier1-3, some ongoing
>
> /Claes


From claes.redestad at oracle.com  Thu Dec  5 18:04:32 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 5 Dec 2019 19:04:32 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <fc6c93b9-d109-0e55-161b-c8e36b1fad65@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
 <39475089-9933-f85a-8927-661bba44780c@oracle.com>
 <b0a25479-3c19-2511-8249-5f65cc07b35b@oracle.com>
 <CAA-vtUwuHi4D+tR-jFtmE6+4ZFq_M4RsQWp5ohYK6=VyAgj6bQ@mail.gmail.com>
 <f0754590-badf-7c19-cf17-b14b5769c70d@oracle.com>
 <fc6c93b9-d109-0e55-161b-c8e36b1fad65@oracle.com>
Message-ID: <5fd925e4-c109-8c27-4af6-25a4976e5551@oracle.com>

Hi Erik,

thanks for looking at this.

On 2019-12-05 18:49, Erik ?sterlund wrote:
> Hi Claes,
> 
> I wonder what issue you ran into with std::numeric_limit on Solaris. I 
> would personally like to understand that
> before dismissing its use and rolling our own function instead.
> Note that std::numeric_limit does not accept CV qualified types - 
> perhaps that is what you ran into.

slowdebug failed to build with the following:

[2019-12-05T01:34:22,502Z] Undefined            first referenced
[2019-12-05T01:34:22,502Z]  symbol                  in file
[2019-12-05T01:34:22,502Z] unsigned
std::_Integer_limits<unsigned,0U,4294967295U,-1,true>::max()

> 
> Perhaps if you pass in RemoveCv<T>::type instead of T to that trait, 
> things will work properly.
> If you really want to note use numeric_limit, then you might have to 
> stick in some more sanity checks in
> that functions, such as checking that the input is integral, etc. And 
> dynamically calculating the limit
> does seem unfortunate. In that case I'd rather we rolled our own numeric 
> limit trait, but I would be
> surprised if that is indeed necessary.

I can try the RemoveCv<T> trick. This "private" max_value utility is not
performance critical since it's only used for an assert and in tests,
but yes it'd be much preferable if we could use std::numeric_limits<..>

> 
> Other than that, I would just like to say I approve of having this stuff 
> in a separate file, as it includes
> a whole bunch of metaprogramming utilities, which I would not like to be 
> added from the #include "allTheRandomStuff"
> header of our JVM, that is globalDefinitions.hpp.

Yes, in this particular case wedging these utilities into
globalDefinitions is tricky. In fact impossible unless we also throw in
count_leading_zeros.hpp there too. I do agree with others that some
functions could be consolidated into bigger lumps.

> 
> Other than that, I think this looks good. Oh - not sure but I have a gut 
> feeling you can remove a few includes
> for zUtils that were only previously used to get to the power of two 
> utility, which is no longer necessary.

Good point, I'll check.

Thanks!

/Claes

From vladimir.kozlov at oracle.com  Thu Dec  5 19:16:37 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 5 Dec 2019 11:16:37 -0800
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
Message-ID: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com>

CCing to GC group.

Looks fine to me but someone from GC land have to look too.

I wish we have more concrete indication for off-heap access instead of guessing it based on how we address memory 
through Unsafe API.

Thanks,
Vladimir

On 11/29/19 7:42 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8226411
> 
> There were a number of fixes in C2 support for unsafe accesses recently which led to additional memory barriers around 
> them. It improved stability, but in some cases it was redundant. One of important use cases which regressed is off-heap 
> accesses [1]. The barriers around them are redundant because they are serialized on raw memory and don't intersect with 
> any on-heap accesses.
> 
> Proposed fix skips memory barriers around unsafe accesses which are provably off-heap (base == NULL).
> 
> It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1].
> 
> Testing: tier1-6.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8224182

From erik.osterlund at oracle.com  Thu Dec  5 20:14:59 2019
From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=)
Date: Thu, 5 Dec 2019 21:14:59 +0100
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com>
References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com>
Message-ID: <BC1A1F43-8B28-4635-899A-3936251746EB@oracle.com>

Hi,

Could we use the existing IN_NATIVE decorator instead of introducing a new decorator that seems to be an alias for the same thing? The decorator describing its use (IN_NATIVE) says it is for off-heap accesses pointing into the heap. We can just remove from the comment the part presuming it is a reference.

What do you think?

Thanks,
/Erik

> On 5 Dec 2019, at 20:16, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> ?CCing to GC group.
> 
> Looks fine to me but someone from GC land have to look too.
> 
> I wish we have more concrete indication for off-heap access instead of guessing it based on how we address memory through Unsafe API.
> 
> Thanks,
> Vladimir
> 
>> On 11/29/19 7:42 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8226411
>> There were a number of fixes in C2 support for unsafe accesses recently which led to additional memory barriers around them. It improved stability, but in some cases it was redundant. One of important use cases which regressed is off-heap accesses [1]. The barriers around them are redundant because they are serialized on raw memory and don't intersect with any on-heap accesses.
>> Proposed fix skips memory barriers around unsafe accesses which are provably off-heap (base == NULL).
>> It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1].
>> Testing: tier1-6.
>> Best regards,
>> Vladimir Ivanov
>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182


From claes.redestad at oracle.com  Thu Dec  5 22:26:54 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 5 Dec 2019 23:26:54 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <5fd925e4-c109-8c27-4af6-25a4976e5551@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <dee8a0b2-09e2-86b3-bbdf-209e20f2901b@oracle.com>
 <CAA-vtUwq3wLTKOStMN7+MTV2HGwg9bFnbMsa=MRWCrWv2HSyCw@mail.gmail.com>
 <39475089-9933-f85a-8927-661bba44780c@oracle.com>
 <b0a25479-3c19-2511-8249-5f65cc07b35b@oracle.com>
 <CAA-vtUwuHi4D+tR-jFtmE6+4ZFq_M4RsQWp5ohYK6=VyAgj6bQ@mail.gmail.com>
 <f0754590-badf-7c19-cf17-b14b5769c70d@oracle.com>
 <fc6c93b9-d109-0e55-161b-c8e36b1fad65@oracle.com>
 <5fd925e4-c109-8c27-4af6-25a4976e5551@oracle.com>
Message-ID: <a366052e-d280-41f9-bf95-32e2d2251ec3@oracle.com>


On 2019-12-05 19:04, Claes Redestad wrote:
> On 2019-12-05 18:49, Erik ?sterlund wrote:
>> Hi Claes,
>>
>> I wonder what issue you ran into with std::numeric_limit on Solaris. I 
>> would personally like to understand that
>> before dismissing its use and rolling our own function instead.
>> Note that std::numeric_limit does not accept CV qualified types - 
>> perhaps that is what you ran into.
> 
> slowdebug failed to build with the following:
> 
> [2019-12-05T01:34:22,502Z] Undefined??????????? first referenced
> [2019-12-05T01:34:22,502Z]? symbol????????????????? in file
> [2019-12-05T01:34:22,502Z] unsigned
> std::_Integer_limits<unsigned,0U,4294967295U,-1,true>::max()
> 
>>
>> Perhaps if you pass in RemoveCv<T>::type instead of T to that trait, 
>> things will work properly.
>> If you really want to note use numeric_limit, then you might have to 
>> stick in some more sanity checks in
>> that functions, such as checking that the input is integral, etc. And 
>> dynamically calculating the limit
>> does seem unfortunate. In that case I'd rather we rolled our own 
>> numeric limit trait, but I would be
>> surprised if that is indeed necessary.
> 
> I can try the RemoveCv<T> trick. This "private" max_value utility is not
> performance critical since it's only used for an assert and in tests,
> but yes it'd be much preferable if we could use std::numeric_limits<..>

This doesn't fix the slowdebug build failure on Solaris.

Ok if I file a follow-up bug to examine this in depth, and move ahead 
with the version here (open.05)?

/Claes

From christian.hagedorn at oracle.com  Fri Dec  6 07:05:11 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 6 Dec 2019 08:05:11 +0100
Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in
 get_early_ctrl_for_expensive
In-Reply-To: <e2543c4c-e5d7-2c22-79a9-ebbefcdf4798@oracle.com>
References: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com>
 <e2543c4c-e5d7-2c22-79a9-ebbefcdf4798@oracle.com>
Message-ID: <b64b9d19-42ac-0783-0b03-9e5ade63c576@oracle.com>

Thank you Tobias for your review! I fixed it.

Best regards,
Christian

On 05.12.19 16:16, Tobias Hartmann wrote:
> Hi Christian,
> 
> looks good to me but the CompileCommand argument in the @run statement of the test should come
> before the test class name (no new webrev required).
> 
> Best regards,
> Tobias
> 
> On 05.12.19 14:22, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8229994
>> http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/
>>
>> As a part of loop peeling, loop-invariant dominating tests are moved out of the loop [1]. Doing so
>> requires to change all control inputs coming from an independent test 'test' to data nodes inside
>> the loop body to use the peeled version 'peeled_test' of 'test' instead [2]: the control test->data
>> will be changed to peeled_test->data. If a data node is expensive then an assertion [3] checks the
>> dominance relation of 'test' and 'peeled_test' ('peeled_test' should dominate 'test') but fails to
>> find 'peeled_test' as part of the idom chain starting from the previous control input 'test' of that
>> expensive node.
>>
>> The reason is that the idom information was not correctly set before at [4] after creating the
>> peeled nodes. If 'head' (a CountedLoop node) has another OuterStripMinedLoop node, let's say
>> 'outer_head', as an input then 'outer_head' is set as idom of 'head' instead of setting the idom of
>> 'outer_head' to the new loop entry from the peeled iteration. We then miss all the idom information
>> of the peeled iteration and the idom of 'outer_head' still points to the old loop entry node before
>> peeling.
>>
>> The fix is straight forward to also account for loop strip mined loops when correcting the idom to
>> the new loop entry from the peeled iteration at [4].
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669
>> [2] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271
>> [3] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153
>> [4] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658

From navy.xliu at gmail.com  Fri Dec  6 07:23:54 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Thu, 5 Dec 2019 23:23:54 -0800
Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction
 -XX:+Verbose
Message-ID: <CAHPdc0m+t9yOZ8YVxbCiRANxaGKO89UL7o-24FmHHQDwCViXNQ@mail.gmail.com>

Hi, Reviewers,

Could you review this very simple bugfix for C1?
JBS: https://bugs.openjdk.java.net/browse/JDK-8235383
Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/

The root cause is some instructions are going to be eliminated, so they are
not assigned to any valid bci.
In present of  -XX:+PrintIRDuringConstruction -XX:+Verbose,  C1 will print
them out and then hit the assert.

Yes, I can twiddle graph_builder to assign right BCIs to them,  but I would
like to have a more robust InstructionPrinter::print_line. the CR will
leave blanks in the position of bci.
Eliminated store for object 0:
.      0    a67    a58._24 := a54 (L) next
Eliminated load:
.      0    i35    a11._24 (I) position

thanks,
--lx

From tobias.hartmann at oracle.com  Fri Dec  6 08:24:01 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 6 Dec 2019 09:24:01 +0100
Subject: RFR(XXS): C1 compilation fails with
 -XX:+PrintIRDuringConstruction -XX:+Verbose
In-Reply-To: <CAHPdc0m+t9yOZ8YVxbCiRANxaGKO89UL7o-24FmHHQDwCViXNQ@mail.gmail.com>
References: <CAHPdc0m+t9yOZ8YVxbCiRANxaGKO89UL7o-24FmHHQDwCViXNQ@mail.gmail.com>
Message-ID: <7176592d-4784-0797-4457-463773464536@oracle.com>

Hi Liu,

your fix looks good to me but could you please add a regression test?

Thanks,
Tobias

On 06.12.19 08:23, Liu Xin wrote:
> Hi, Reviewers,
> 
> Could you review this very simple bugfix for C1?
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235383
> Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/
> 
> The root cause is some instructions are going to be eliminated, so they are
> not assigned to any valid bci.
> In present of  -XX:+PrintIRDuringConstruction -XX:+Verbose,  C1 will print
> them out and then hit the assert.
> 
> Yes, I can twiddle graph_builder to assign right BCIs to them,  but I would
> like to have a more robust InstructionPrinter::print_line. the CR will
> leave blanks in the position of bci.
> Eliminated store for object 0:
> .      0    a67    a58._24 := a54 (L) next
> Eliminated load:
> .      0    i35    a11._24 (I) position
> 
> thanks,
> --lx
> 

From rahul.v.raghavan at oracle.com  Fri Dec  6 14:20:58 2019
From: rahul.v.raghavan at oracle.com (Rahul Raghavan)
Date: Fri, 6 Dec 2019 19:50:58 +0530
Subject: [14]RFR: 8233453: MLVM deoptimize stress test timed out
Message-ID: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com>

Hi,

Please review the following fix changeset.

<webrev> - http://cr.openjdk.java.net/~rraghavan/8233453/webrev.00/

# https://bugs.openjdk.java.net/browse/JDK-8233453
# 
test-[open/test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] 


Issue is  intermittent timeout failures of the test;
mostly with solaris-sparc and sometimes with windows-x64.

The proposed fix here is to increase the timeout factor for the test.
[test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java]
- * @run main/othervm
+ * @run main/othervm/timeout=300
.......
- * @run main/othervm
+ * @run main/othervm/timeout=300

For some timeout failure cases of the test run, logs got generated with 
extra thread dump info.
   e.g.: one attached to JBS page -
   https://bugs.openjdk.java.net/secure/attachment/85805/Test_id0--JDK13.jtr
But could not find any hint of blocking threads or deadlocks from the 
analysis of these thread dumps.
Also instead it seems as if the timeout factor for the test is less
and that test just need more time to execute.
Got no failures with repeat test run after above proposed fix, also 
supports this.


Also checking the comments, fix done for old -
  - https://bugs.openjdk.java.net/browse/JDK-8212028
  (Use run-test makefile framework for testing in Oracle's Mach5)
  - http://hg.openjdk.java.net/jdk/jdk/rev/28375a1de254
Found that similar '/timeout=300' addition was done for some 
'../vmTestbase/vm/mlvm/..' tests.


It seems the intermittent timeouts for the test started with jdk-13+17, 
with fix changeset of -
   https://bugs.openjdk.java.net/browse/JDK-8221393
   ResolvedMethodTable too small for StackWalking applications
(could not get any timeout failure for the test with sources updated to 
jdk-13+16 or updated to changeset version just before 8221393 fix.)
Opened a separate task - JDK-8235485 - to check if this slowdown is 
expected.


Not getting any timeout failure, for repeat test run with above fix 
proposal.


Thanks,
Rahul

From vladimir.x.ivanov at oracle.com  Fri Dec  6 15:19:58 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 6 Dec 2019 18:19:58 +0300
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <BC1A1F43-8B28-4635-899A-3936251746EB@oracle.com>
References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com>
 <BC1A1F43-8B28-4635-899A-3936251746EB@oracle.com>
Message-ID: <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com>

Hi Erik,

I like your idea. Here's updated version:
   http://cr.openjdk.java.net/~vlivanov/8226411/webrev.01

While browsing the code, I noticed that changes in 
G1BarrierSetC2::load_at_resolved() aren't required (need_cpu_mem_bar is 
used for oop case). But I decided to keep them to keep it (relatively) 
close to C2Access::needs_cpu_membar().

Best regards,
Vladimir Ivanov

On 05.12.2019 23:14, Erik ?sterlund wrote:
> Hi,
> 
> Could we use the existing IN_NATIVE decorator instead of introducing a new decorator that seems to be an alias for the same thing? The decorator describing its use (IN_NATIVE) says it is for off-heap accesses pointing into the heap. We can just remove from the comment the part presuming it is a reference.
> 
> What do you think?
> 
> Thanks,
> /Erik
> 
>> On 5 Dec 2019, at 20:16, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> ?CCing to GC group.
>>
>> Looks fine to me but someone from GC land have to look too.
>>
>> I wish we have more concrete indication for off-heap access instead of guessing it based on how we address memory through Unsafe API.
>>
>> Thanks,
>> Vladimir
>>
>>> On 11/29/19 7:42 AM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8226411
>>> There were a number of fixes in C2 support for unsafe accesses recently which led to additional memory barriers around them. It improved stability, but in some cases it was redundant. One of important use cases which regressed is off-heap accesses [1]. The barriers around them are redundant because they are serialized on raw memory and don't intersect with any on-heap accesses.
>>> Proposed fix skips memory barriers around unsafe accesses which are provably off-heap (base == NULL).
>>> It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1].
>>> Testing: tier1-6.
>>> Best regards,
>>> Vladimir Ivanov
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182
> 

From vladimir.x.ivanov at oracle.com  Fri Dec  6 15:27:32 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 6 Dec 2019 18:27:32 +0300
Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
 Thread::currentThread() intrinsic
In-Reply-To: <AM0PR0202MB3297361B7248F5FEA64AEC8C9A5C0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <d66934e4-5a8b-ac14-ddca-1d557846d4ee@oracle.com>
 <AM0PR0202MB3297361B7248F5FEA64AEC8C9A5C0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <959d9da3-ae64-446b-5fbf-3a871fbf3642@oracle.com>

Thanks, Martin.

> The j.l.Thread Oop for Thread::current() is allowed to live in a register with this change.
> GC will find it via OopMap at safepoints. So I think it's correct.

FTR nothing changes in that respect: with the patch it behaves as if a 
j.l.Thread is cached in a local after the first Thread::currentThread() 
call. But it can be done explicitly on bytecode level in a user code. 
It's not the case that  j.l.Thread oop couldn't be alive across a 
safepoint before and now it can.

Best regards,
Vladimir Ivanov

>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> bounces at openjdk.java.net> On Behalf Of Vladimir Ivanov
>> Sent: Donnerstag, 5. Dezember 2019 13:44
>> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
>> Thread::currentThread() intrinsic
>>
>> http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8235143
>>
>> Thread::currentThread() intrinsic doesn't need memory state:
>> though multiple threads can execute same code, "current thread" can't
>> change in the context of a single method activation. So, once it is
>> observed, it's safe to share among all users.
>>
>> One of the use cases which benefit a lot from such optimization is
>> ownership checks for thread confined resources (fast path check for
>> owner thread to avoid heavy-weight synchronization).
>>
>> The patch was part of foreign-memaccess branch in Project Panama and
>> showed good performance results on Memory Access API implementation
>> [1].
>>
>> Testing: tier1-4
>>
>> PS: the optimization should be disabled in Project Loom: the assumption
>> doesn't hold for continuations (in their current form).
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://openjdk.java.net/jeps/370
>>       JEP 370: Foreign-Memory Access API (Incubator)

From martin.doerr at sap.com  Fri Dec  6 17:04:38 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 6 Dec 2019 17:04:38 +0000
Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
 Thread::currentThread() intrinsic
In-Reply-To: <959d9da3-ae64-446b-5fbf-3a871fbf3642@oracle.com>
References: <d66934e4-5a8b-ac14-ddca-1d557846d4ee@oracle.com>
 <AM0PR0202MB3297361B7248F5FEA64AEC8C9A5C0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <959d9da3-ae64-446b-5fbf-3a871fbf3642@oracle.com>
Message-ID: <AM0PR0202MB3297B1E7A0E42E23E762F33B9A5F0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi Vladimir,

thanks for your additional comments.

> It's not the case that  j.l.Thread oop couldn't be alive across a
> safepoint before and now it can.
Right, but the j.l.Thread oop loads can move across safepoints, now. So they can possibly become additional GC roots in registers at safepoints. That should be fine.

Best regards,
Martin


> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Freitag, 6. Dezember 2019 16:28
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot compiler <hotspot-
> compiler-dev at openjdk.java.net>
> Subject: Re: [14] RFR (XXS): 8235143: C2: No memory state needed in
> Thread::currentThread() intrinsic
> 
> Thanks, Martin.
> 
> > The j.l.Thread Oop for Thread::current() is allowed to live in a register with
> this change.
> > GC will find it via OopMap at safepoints. So I think it's correct.
> 
> FTR nothing changes in that respect: with the patch it behaves as if a
> j.l.Thread is cached in a local after the first Thread::currentThread()
> call. But it can be done explicitly on bytecode level in a user code.
> It's not the case that  j.l.Thread oop couldn't be alive across a
> safepoint before and now it can.
> 
> Best regards,
> Vladimir Ivanov
> 
> >> -----Original Message-----
> >> From: hotspot-compiler-dev <hotspot-compiler-dev-
> >> bounces at openjdk.java.net> On Behalf Of Vladimir Ivanov
> >> Sent: Donnerstag, 5. Dezember 2019 13:44
> >> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> >> Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
> >> Thread::currentThread() intrinsic
> >>
> >> http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/
> >> https://bugs.openjdk.java.net/browse/JDK-8235143
> >>
> >> Thread::currentThread() intrinsic doesn't need memory state:
> >> though multiple threads can execute same code, "current thread" can't
> >> change in the context of a single method activation. So, once it is
> >> observed, it's safe to share among all users.
> >>
> >> One of the use cases which benefit a lot from such optimization is
> >> ownership checks for thread confined resources (fast path check for
> >> owner thread to avoid heavy-weight synchronization).
> >>
> >> The patch was part of foreign-memaccess branch in Project Panama and
> >> showed good performance results on Memory Access API
> implementation
> >> [1].
> >>
> >> Testing: tier1-4
> >>
> >> PS: the optimization should be disabled in Project Loom: the assumption
> >> doesn't hold for continuations (in their current form).
> >>
> >> Best regards,
> >> Vladimir Ivanov
> >>
> >> [1] https://openjdk.java.net/jeps/370
> >>       JEP 370: Foreign-Memory Access API (Incubator)

From erik.osterlund at oracle.com  Fri Dec  6 17:10:37 2019
From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=)
Date: Fri, 6 Dec 2019 18:10:37 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <a366052e-d280-41f9-bf95-32e2d2251ec3@oracle.com>
References: <a366052e-d280-41f9-bf95-32e2d2251ec3@oracle.com>
Message-ID: <215D2CA3-AFE2-4943-A483-809BBAA514CB@oracle.com>

Hi Claes,

Sure. Looks good!

Thanks,
/Erik

> On 5 Dec 2019, at 23:24, Claes Redestad <claes.redestad at oracle.com> wrote:
> 
> ?
> 
>> On 2019-12-05 19:04, Claes Redestad wrote:
>>> On 2019-12-05 18:49, Erik ?sterlund wrote:
>>> Hi Claes,
>>> 
>>> I wonder what issue you ran into with std::numeric_limit on Solaris. I would personally like to understand that
>>> before dismissing its use and rolling our own function instead.
>>> Note that std::numeric_limit does not accept CV qualified types - perhaps that is what you ran into.
>> slowdebug failed to build with the following:
>> [2019-12-05T01:34:22,502Z] Undefined            first referenced
>> [2019-12-05T01:34:22,502Z]  symbol                  in file
>> [2019-12-05T01:34:22,502Z] unsigned
>> std::_Integer_limits<unsigned,0U,4294967295U,-1,true>::max()
>>> 
>>> Perhaps if you pass in RemoveCv<T>::type instead of T to that trait, things will work properly.
>>> If you really want to note use numeric_limit, then you might have to stick in some more sanity checks in
>>> that functions, such as checking that the input is integral, etc. And dynamically calculating the limit
>>> does seem unfortunate. In that case I'd rather we rolled our own numeric limit trait, but I would be
>>> surprised if that is indeed necessary.
>> I can try the RemoveCv<T> trick. This "private" max_value utility is not
>> performance critical since it's only used for an assert and in tests,
>> but yes it'd be much preferable if we could use std::numeric_limits<..>
> 
> This doesn't fix the slowdebug build failure on Solaris.
> 
> Ok if I file a follow-up bug to examine this in depth, and move ahead with the version here (open.05)?
> 
> /Claes


From igor.ignatyev at oracle.com  Fri Dec  6 17:12:07 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 6 Dec 2019 09:12:07 -0800
Subject: [14]RFR: 8233453: MLVM deoptimize stress test timed out
In-Reply-To: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com>
References: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com>
Message-ID: <2B01486C-0BBC-48DB-BBA0-D2CD67D5A1DD@oracle.com>

Hi Rahul,

the fix sound reasonable to me.

Thanks,
Igor

> On Dec 6, 2019, at 6:20 AM, Rahul Raghavan <rahul.v.raghavan at oracle.com> wrote:
> 
> Hi,
> 
> Please review the following fix changeset.
> 
> <webrev> - http://cr.openjdk.java.net/~rraghavan/8233453/webrev.00/
> 
> # https://bugs.openjdk.java.net/browse/JDK-8233453
> # test-[open/test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] 
> 
> Issue is  intermittent timeout failures of the test;
> mostly with solaris-sparc and sometimes with windows-x64.
> 
> The proposed fix here is to increase the timeout factor for the test.
> [test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java]
> - * @run main/othervm
> + * @run main/othervm/timeout=300
> .......
> - * @run main/othervm
> + * @run main/othervm/timeout=300
> 
> For some timeout failure cases of the test run, logs got generated with extra thread dump info.
>  e.g.: one attached to JBS page -
>  https://bugs.openjdk.java.net/secure/attachment/85805/Test_id0--JDK13.jtr
> But could not find any hint of blocking threads or deadlocks from the analysis of these thread dumps.
> Also instead it seems as if the timeout factor for the test is less
> and that test just need more time to execute.
> Got no failures with repeat test run after above proposed fix, also supports this.
> 
> 
> Also checking the comments, fix done for old -
> - https://bugs.openjdk.java.net/browse/JDK-8212028
> (Use run-test makefile framework for testing in Oracle's Mach5)
> - http://hg.openjdk.java.net/jdk/jdk/rev/28375a1de254
> Found that similar '/timeout=300' addition was done for some '../vmTestbase/vm/mlvm/..' tests.
> 
> 
> It seems the intermittent timeouts for the test started with jdk-13+17, with fix changeset of -
>  https://bugs.openjdk.java.net/browse/JDK-8221393
>  ResolvedMethodTable too small for StackWalking applications
> (could not get any timeout failure for the test with sources updated to jdk-13+16 or updated to changeset version just before 8221393 fix.)
> Opened a separate task - JDK-8235485 - to check if this slowdown is expected.
> 
> 
> Not getting any timeout failure, for repeat test run with above fix proposal.
> 
> 
> Thanks,
> Rahul


From tobias.hartmann at oracle.com  Fri Dec  6 17:23:09 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 6 Dec 2019 18:23:09 +0100
Subject: [14]RFR: 8233453: MLVM deoptimize stress test timed out
In-Reply-To: <2B01486C-0BBC-48DB-BBA0-D2CD67D5A1DD@oracle.com>
References: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com>
 <2B01486C-0BBC-48DB-BBA0-D2CD67D5A1DD@oracle.com>
Message-ID: <6a887d8a-8843-bd21-857d-1d41531738aa@oracle.com>

+1

Best regards,
Tobias

On 06.12.19 18:12, Igor Ignatyev wrote:
> Hi Rahul,
> 
> the fix sound reasonable to me.
> 
> Thanks,
> Igor
> 
>> On Dec 6, 2019, at 6:20 AM, Rahul Raghavan <rahul.v.raghavan at oracle.com> wrote:
>>
>> Hi,
>>
>> Please review the following fix changeset.
>>
>> <webrev> - http://cr.openjdk.java.net/~rraghavan/8233453/webrev.00/
>>
>> # https://bugs.openjdk.java.net/browse/JDK-8233453
>> # test-[open/test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] 
>>
>> Issue is  intermittent timeout failures of the test;
>> mostly with solaris-sparc and sometimes with windows-x64.
>>
>> The proposed fix here is to increase the timeout factor for the test.
>> [test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java]
>> - * @run main/othervm
>> + * @run main/othervm/timeout=300
>> .......
>> - * @run main/othervm
>> + * @run main/othervm/timeout=300
>>
>> For some timeout failure cases of the test run, logs got generated with extra thread dump info.
>>  e.g.: one attached to JBS page -
>>  https://bugs.openjdk.java.net/secure/attachment/85805/Test_id0--JDK13.jtr
>> But could not find any hint of blocking threads or deadlocks from the analysis of these thread dumps.
>> Also instead it seems as if the timeout factor for the test is less
>> and that test just need more time to execute.
>> Got no failures with repeat test run after above proposed fix, also supports this.
>>
>>
>> Also checking the comments, fix done for old -
>> - https://bugs.openjdk.java.net/browse/JDK-8212028
>> (Use run-test makefile framework for testing in Oracle's Mach5)
>> - http://hg.openjdk.java.net/jdk/jdk/rev/28375a1de254
>> Found that similar '/timeout=300' addition was done for some '../vmTestbase/vm/mlvm/..' tests.
>>
>>
>> It seems the intermittent timeouts for the test started with jdk-13+17, with fix changeset of -
>>  https://bugs.openjdk.java.net/browse/JDK-8221393
>>  ResolvedMethodTable too small for StackWalking applications
>> (could not get any timeout failure for the test with sources updated to jdk-13+16 or updated to changeset version just before 8221393 fix.)
>> Opened a separate task - JDK-8235485 - to check if this slowdown is expected.
>>
>>
>> Not getting any timeout failure, for repeat test run with above fix proposal.
>>
>>
>> Thanks,
>> Rahul
> 

From vladimir.kozlov at oracle.com  Fri Dec  6 19:23:20 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 6 Dec 2019 11:23:20 -0800
Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use the
 original Method
Message-ID: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com>

https://cr.openjdk.java.net/~kvn/8235438/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8235438

This fix is prepared by Tom R.

"The JDK14 version of StackTraceElement::decode is based on the JDK8 code which contains mixed usages of 
method->constants() and method->method_holder()->constants() assuming they point to the same thing. In the case of 
anonymous methods this isn't true. Usually this isn't a problem but if CDS is enabled the the version flag of 
method->method_holder()->constants() is non-zero but version of of method->constants() is 0 which causes the code to 
switch constants pools and it reads garbage. JDK-8140685 [1] refactored this code to remove this logic and the JVMCI 
version of this code should be converted to use the same scheme."

I tested tier1-2 and tier3-4-graal. All clean.

Thanks,
Vladimir

[1] https://bugs.openjdk.java.net/browse/JDK-8140685

From sandhya.viswanathan at intel.com  Fri Dec  6 19:45:56 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 6 Dec 2019 19:45:56 +0000
Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067
Message-ID: <BYAPR11MB2983FB4EBCE051C905C4606FEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>

The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq.
An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes.

JBS: https://bugs.openjdk.java.net/browse/JDK-8235510
Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/

Please review and approve.

Best Regards,
Sandhya


From vladimir.kozlov at oracle.com  Fri Dec  6 20:04:26 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 6 Dec 2019 12:04:26 -0800
Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after
 8200067
In-Reply-To: <BYAPR11MB2983FB4EBCE051C905C4606FEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <BYAPR11MB2983FB4EBCE051C905C4606FEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <c91647b4-640c-78d2-01fd-39b62ee0b1ce@oracle.com>

Hi Sandhya,

Ii is confusing after looking on this and 8200067 changes [1].
CPUID bit we check and supports_vpclmulqdq() function use 'vpclmulqdq' name but there is already such old AVX 
instruction [2] which is guarded by supports_clmul(). What actually 8200067 added and you are removing is evpclmulqdq 
instruction usage.
I think you should do renaming of CPUID bit and supports_vpclmulqdq() function to reflect that and avoid confusion.

Is vpclmulqdq also affected?

Thanks,
Vladimir

[1] http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/
[2] http://hg.openjdk.java.net/jdk/jdk/file/1498cd1c98ad/src/hotspot/cpu/x86/assembler_x86.cpp#l7224

On 12/6/19 11:45 AM, Viswanathan, Sandhya wrote:
> The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq.
> An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235510
> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/
> 
> Please review and approve.
> 
> Best Regards,
> Sandhya
> 

From coleen.phillimore at oracle.com  Fri Dec  6 20:11:19 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 6 Dec 2019 15:11:19 -0500
Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use
 the original Method
In-Reply-To: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com>
References: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com>
Message-ID: <d66d5529-dafc-6f6e-fbbf-462b5be57812@oracle.com>

This looks good.? I'm glad to see less places that have to call 
version_matches(), and this awkward handling for redefinition.

CDS should probably zero version after it loads the class out of the 
archive too, which I assume would have fixed it.? But I like the new 
version of the code better than the old.

Also, one small nit, which made reviewing this in frames a pain: Can you 
split these long lines?

2722 void java_lang_StackTraceElement::decode_file_and_line(Handle 
java_class, InstanceKlass* holder, int version,
2723 const methodHandle& method, int bci,
2724 Symbol*& source, oop& source_file, int& line_number, TRAPS) {

2746 void java_lang_StackTraceElement::decode(const methodHandle& 
method, int bci, Symbol*& methodname, Symbol*& filename, int& 
line_number, TRAPS) {


I don't see why the caller of java_lang_StackTraceElement can't get the 
methodname itself, and save this one output parameter.

thanks,
Coleen


On 12/6/19 2:23 PM, Vladimir Kozlov wrote:
> https://cr.openjdk.java.net/~kvn/8235438/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8235438
>
> This fix is prepared by Tom R.
>
> "The JDK14 version of StackTraceElement::decode is based on the JDK8 
> code which contains mixed usages of method->constants() and 
> method->method_holder()->constants() assuming they point to the same 
> thing. In the case of anonymous methods this isn't true. Usually this 
> isn't a problem but if CDS is enabled the the version flag of 
> method->method_holder()->constants() is non-zero but version of of 
> method->constants() is 0 which causes the code to switch constants 
> pools and it reads garbage. JDK-8140685 [1] refactored this code to 
> remove this logic and the JVMCI version of this code should be 
> converted to use the same scheme."
>
> I tested tier1-2 and tier3-4-graal. All clean.
>
> Thanks,
> Vladimir
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8140685


From vladimir.kozlov at oracle.com  Fri Dec  6 21:42:21 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 6 Dec 2019 13:42:21 -0800
Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use
 the original Method
In-Reply-To: <d66d5529-dafc-6f6e-fbbf-462b5be57812@oracle.com>
References: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com>
 <d66d5529-dafc-6f6e-fbbf-462b5be57812@oracle.com>
Message-ID: <44c651fd-4ee6-9e17-cfbd-6cd90f56d663@oracle.com>

Thank you, Coleen

On 12/6/19 12:11 PM, coleen.phillimore at oracle.com wrote:
> This looks good.? I'm glad to see less places that have to call version_matches(), and this awkward handling for 
> redefinition.
> 
> CDS should probably zero version after it loads the class out of the archive too, which I assume would have fixed it.  
> But I like the new version of the code better than the old.
> 
> Also, one small nit, which made reviewing this in frames a pain: Can you split these long lines?
> 
> 2722 void java_lang_StackTraceElement::decode_file_and_line(Handle java_class, InstanceKlass* holder, int version,
> 2723 const methodHandle& method, int bci,
> 2724 Symbol*& source, oop& source_file, int& line_number, TRAPS) {
> 
> 2746 void java_lang_StackTraceElement::decode(const methodHandle& method, int bci, Symbol*& methodname, Symbol*& 
> filename, int& line_number, TRAPS) {

Done.

> 
> I don't see why the caller of java_lang_StackTraceElement can't get the methodname itself, and save this one output 
> parameter.

Good suggestion.

https://cr.openjdk.java.net/~kvn/8235438/webrev.01/

Thanks,
Vladimir

> 
> thanks,
> Coleen
> 
> 
> On 12/6/19 2:23 PM, Vladimir Kozlov wrote:
>> https://cr.openjdk.java.net/~kvn/8235438/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8235438
>>
>> This fix is prepared by Tom R.
>>
>> "The JDK14 version of StackTraceElement::decode is based on the JDK8 code which contains mixed usages of 
>> method->constants() and method->method_holder()->constants() assuming they point to the same thing. In the case of 
>> anonymous methods this isn't true. Usually this isn't a problem but if CDS is enabled the the version flag of 
>> method->method_holder()->constants() is non-zero but version of of method->constants() is 0 which causes the code to 
>> switch constants pools and it reads garbage. JDK-8140685 [1] refactored this code to remove this logic and the JVMCI 
>> version of this code should be converted to use the same scheme."
>>
>> I tested tier1-2 and tier3-4-graal. All clean.
>>
>> Thanks,
>> Vladimir
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8140685
> 

From coleen.phillimore at oracle.com  Fri Dec  6 21:47:07 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 6 Dec 2019 16:47:07 -0500
Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use
 the original Method
In-Reply-To: <44c651fd-4ee6-9e17-cfbd-6cd90f56d663@oracle.com>
References: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com>
 <d66d5529-dafc-6f6e-fbbf-462b5be57812@oracle.com>
 <44c651fd-4ee6-9e17-cfbd-6cd90f56d663@oracle.com>
Message-ID: <898a6ab3-63ed-8884-73dd-425358bcb53f@oracle.com>


On 12/6/19 4:42 PM, Vladimir Kozlov wrote:
> Thank you, Coleen
>
> On 12/6/19 12:11 PM, coleen.phillimore at oracle.com wrote:
>> This looks good.? I'm glad to see less places that have to call 
>> version_matches(), and this awkward handling for redefinition.
>>
>> CDS should probably zero version after it loads the class out of the 
>> archive too, which I assume would have fixed it.? But I like the new 
>> version of the code better than the old.
>>
>> Also, one small nit, which made reviewing this in frames a pain: Can 
>> you split these long lines?
>>
>> 2722 void java_lang_StackTraceElement::decode_file_and_line(Handle 
>> java_class, InstanceKlass* holder, int version,
>> 2723 const methodHandle& method, int bci,
>> 2724 Symbol*& source, oop& source_file, int& line_number, TRAPS) {
>>
>> 2746 void java_lang_StackTraceElement::decode(const methodHandle& 
>> method, int bci, Symbol*& methodname, Symbol*& filename, int& 
>> line_number, TRAPS) {
>
> Done.
>
>>
>> I don't see why the caller of java_lang_StackTraceElement can't get 
>> the methodname itself, and save this one output parameter.
>
> Good suggestion.
>
> https://cr.openjdk.java.net/~kvn/8235438/webrev.01/

Yes, that makes is clear which functions want which output parameters.

Looks good - thanks,
Coleen
>
> Thanks,
> Vladimir
>
>>
>> thanks,
>> Coleen
>>
>>
>> On 12/6/19 2:23 PM, Vladimir Kozlov wrote:
>>> https://cr.openjdk.java.net/~kvn/8235438/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8235438
>>>
>>> This fix is prepared by Tom R.
>>>
>>> "The JDK14 version of StackTraceElement::decode is based on the JDK8 
>>> code which contains mixed usages of method->constants() and 
>>> method->method_holder()->constants() assuming they point to the same 
>>> thing. In the case of anonymous methods this isn't true. Usually 
>>> this isn't a problem but if CDS is enabled the the version flag of 
>>> method->method_holder()->constants() is non-zero but version of of 
>>> method->constants() is 0 which causes the code to switch constants 
>>> pools and it reads garbage. JDK-8140685 [1] refactored this code to 
>>> remove this logic and the JVMCI version of this code should be 
>>> converted to use the same scheme."
>>>
>>> I tested tier1-2 and tier3-4-graal. All clean.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8140685
>>


From vladimir.kozlov at oracle.com  Fri Dec  6 21:58:17 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 6 Dec 2019 13:58:17 -0800
Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use
 the original Method
In-Reply-To: <898a6ab3-63ed-8884-73dd-425358bcb53f@oracle.com>
References: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com>
 <d66d5529-dafc-6f6e-fbbf-462b5be57812@oracle.com>
 <44c651fd-4ee6-9e17-cfbd-6cd90f56d663@oracle.com>
 <898a6ab3-63ed-8884-73dd-425358bcb53f@oracle.com>
Message-ID: <9cb378b7-129c-2625-c754-c381e3ae7b91@oracle.com>

Thank you, Coleen

Vladimir

On 12/6/19 1:47 PM, coleen.phillimore at oracle.com wrote:
> 
> 
> On 12/6/19 4:42 PM, Vladimir Kozlov wrote:
>> Thank you, Coleen
>>
>> On 12/6/19 12:11 PM, coleen.phillimore at oracle.com wrote:
>>> This looks good.? I'm glad to see less places that have to call version_matches(), and this awkward handling for 
>>> redefinition.
>>>
>>> CDS should probably zero version after it loads the class out of the archive too, which I assume would have fixed 
>>> it.? But I like the new version of the code better than the old.
>>>
>>> Also, one small nit, which made reviewing this in frames a pain: Can you split these long lines?
>>>
>>> 2722 void java_lang_StackTraceElement::decode_file_and_line(Handle java_class, InstanceKlass* holder, int version,
>>> 2723 const methodHandle& method, int bci,
>>> 2724 Symbol*& source, oop& source_file, int& line_number, TRAPS) {
>>>
>>> 2746 void java_lang_StackTraceElement::decode(const methodHandle& method, int bci, Symbol*& methodname, Symbol*& 
>>> filename, int& line_number, TRAPS) {
>>
>> Done.
>>
>>>
>>> I don't see why the caller of java_lang_StackTraceElement can't get the methodname itself, and save this one output 
>>> parameter.
>>
>> Good suggestion.
>>
>> https://cr.openjdk.java.net/~kvn/8235438/webrev.01/
> 
> Yes, that makes is clear which functions want which output parameters.
> 
> Looks good - thanks,
> Coleen
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> On 12/6/19 2:23 PM, Vladimir Kozlov wrote:
>>>> https://cr.openjdk.java.net/~kvn/8235438/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8235438
>>>>
>>>> This fix is prepared by Tom R.
>>>>
>>>> "The JDK14 version of StackTraceElement::decode is based on the JDK8 code which contains mixed usages of 
>>>> method->constants() and method->method_holder()->constants() assuming they point to the same thing. In the case of 
>>>> anonymous methods this isn't true. Usually this isn't a problem but if CDS is enabled the the version flag of 
>>>> method->method_holder()->constants() is non-zero but version of of method->constants() is 0 which causes the code to 
>>>> switch constants pools and it reads garbage. JDK-8140685 [1] refactored this code to remove this logic and the JVMCI 
>>>> version of this code should be converted to use the same scheme."
>>>>
>>>> I tested tier1-2 and tier3-4-graal. All clean.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8140685
>>>
> 

From sandhya.viswanathan at intel.com  Fri Dec  6 22:05:16 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 6 Dec 2019 22:05:16 +0000
Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after
 8200067
In-Reply-To: <c91647b4-640c-78d2-01fd-39b62ee0b1ce@oracle.com>
References: <BYAPR11MB2983FB4EBCE051C905C4606FEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>
 <c91647b4-640c-78d2-01fd-39b62ee0b1ce@oracle.com>
Message-ID: <BYAPR11MB2983952DDBB6DDDC55C0B0CAEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>

Hi Vladimir,

It is only AVX512 evpclmulqdq() based CRC32 that is affected. 
In the updated webrev, I have changed the supports_vpclmulqdq() to supports_avx512_vpclmulqdq().

JBS: https://bugs.openjdk.java.net/browse/JDK-8235510
Updated webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.01/

Best Regards,
Sandhya 

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Friday, December 06, 2019 12:04 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067

Hi Sandhya,

Ii is confusing after looking on this and 8200067 changes [1].
CPUID bit we check and supports_vpclmulqdq() function use 'vpclmulqdq' name but there is already such old AVX instruction [2] which is guarded by supports_clmul(). What actually 8200067 added and you are removing is evpclmulqdq instruction usage.
I think you should do renaming of CPUID bit and supports_vpclmulqdq() function to reflect that and avoid confusion.

Is vpclmulqdq also affected?

Thanks,
Vladimir

[1] http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/
[2] http://hg.openjdk.java.net/jdk/jdk/file/1498cd1c98ad/src/hotspot/cpu/x86/assembler_x86.cpp#l7224

On 12/6/19 11:45 AM, Viswanathan, Sandhya wrote:
> The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq.
> An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235510
> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/
> 
> Please review and approve.
> 
> Best Regards,
> Sandhya
> 

From vladimir.kozlov at oracle.com  Fri Dec  6 22:41:16 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 6 Dec 2019 14:41:16 -0800
Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after
 8200067
In-Reply-To: <BYAPR11MB2983952DDBB6DDDC55C0B0CAEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <BYAPR11MB2983FB4EBCE051C905C4606FEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>
 <c91647b4-640c-78d2-01fd-39b62ee0b1ce@oracle.com>
 <BYAPR11MB2983952DDBB6DDDC55C0B0CAEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <5b72ed1e-a723-78cc-e836-758aa8481ae2@oracle.com>

Good.

Do you want to backport this into 11u too?

Thanks,
Vladimir

On 12/6/19 2:05 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> It is only AVX512 evpclmulqdq() based CRC32 that is affected.
> In the updated webrev, I have changed the supports_vpclmulqdq() to supports_avx512_vpclmulqdq().
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235510
> Updated webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.01/
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Friday, December 06, 2019 12:04 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067
> 
> Hi Sandhya,
> 
> Ii is confusing after looking on this and 8200067 changes [1].
> CPUID bit we check and supports_vpclmulqdq() function use 'vpclmulqdq' name but there is already such old AVX instruction [2] which is guarded by supports_clmul(). What actually 8200067 added and you are removing is evpclmulqdq instruction usage.
> I think you should do renaming of CPUID bit and supports_vpclmulqdq() function to reflect that and avoid confusion.
> 
> Is vpclmulqdq also affected?
> 
> Thanks,
> Vladimir
> 
> [1] http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/
> [2] http://hg.openjdk.java.net/jdk/jdk/file/1498cd1c98ad/src/hotspot/cpu/x86/assembler_x86.cpp#l7224
> 
> On 12/6/19 11:45 AM, Viswanathan, Sandhya wrote:
>> The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq.
>> An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8235510
>> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/
>>
>> Please review and approve.
>>
>> Best Regards,
>> Sandhya
>>

From sandhya.viswanathan at intel.com  Fri Dec  6 23:02:10 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 6 Dec 2019 23:02:10 +0000
Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after
 8200067
In-Reply-To: <5b72ed1e-a723-78cc-e836-758aa8481ae2@oracle.com>
References: <BYAPR11MB2983FB4EBCE051C905C4606FEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>
 <c91647b4-640c-78d2-01fd-39b62ee0b1ce@oracle.com>
 <BYAPR11MB2983952DDBB6DDDC55C0B0CAEF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>
 <5b72ed1e-a723-78cc-e836-758aa8481ae2@oracle.com>
Message-ID: <BYAPR11MB298365FD959475F186C66227EF5F0@BYAPR11MB2983.namprd11.prod.outlook.com>

Yes I want to backport this to 11u too.

Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Friday, December 06, 2019 2:41 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067

Good.

Do you want to backport this into 11u too?

Thanks,
Vladimir

On 12/6/19 2:05 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> It is only AVX512 evpclmulqdq() based CRC32 that is affected.
> In the updated webrev, I have changed the supports_vpclmulqdq() to supports_avx512_vpclmulqdq().
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235510
> Updated webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.01/
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Friday, December 06, 2019 12:04 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067
> 
> Hi Sandhya,
> 
> Ii is confusing after looking on this and 8200067 changes [1].
> CPUID bit we check and supports_vpclmulqdq() function use 'vpclmulqdq' name but there is already such old AVX instruction [2] which is guarded by supports_clmul(). What actually 8200067 added and you are removing is evpclmulqdq instruction usage.
> I think you should do renaming of CPUID bit and supports_vpclmulqdq() function to reflect that and avoid confusion.
> 
> Is vpclmulqdq also affected?
> 
> Thanks,
> Vladimir
> 
> [1] http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/
> [2] http://hg.openjdk.java.net/jdk/jdk/file/1498cd1c98ad/src/hotspot/cpu/x86/assembler_x86.cpp#l7224
> 
> On 12/6/19 11:45 AM, Viswanathan, Sandhya wrote:
>> The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq.
>> An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8235510
>> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/
>>
>> Please review and approve.
>>
>> Best Regards,
>> Sandhya
>>

From john.r.rose at oracle.com  Sat Dec  7 01:42:24 2019
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 6 Dec 2019 17:42:24 -0800
Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
 Thread::currentThread() intrinsic
In-Reply-To: <d66934e4-5a8b-ac14-ddca-1d557846d4ee@oracle.com>
References: <d66934e4-5a8b-ac14-ddca-1d557846d4ee@oracle.com>
Message-ID: <A576A719-4E5F-4479-A2B0-475A36B1F36F@oracle.com>

Looks good, although it?s a little more complicated to read.

It makes me wonder what?s wrong with the various overloadings
of GraphKit::make_load, that you have to open-code a special call.
If this happens a lot, we should try to figure out a new way to call
GK::make_load.  I would have thought there would be an adr_idx
value for C->immutable_memory!

Feeding from immutable_memory will probably be a good thing to
do in more circumstances as we get more reliably immutable data.

? John

> On Dec 5, 2019, at 4:43 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8235143
> 
> Thread::currentThread() intrinsic doesn't need memory state:
> though multiple threads can execute same code, "current thread" can't change in the context of a single method activation. So, once it is observed, it's safe to share among all users.
> 
> One of the use cases which benefit a lot from such optimization is ownership checks for thread confined resources (fast path check for owner thread to avoid heavy-weight synchronization).
> 
> The patch was part of foreign-memaccess branch in Project Panama and showed good performance results on Memory Access API implementation [1].
> 
> Testing: tier1-4
> 
> PS: the optimization should be disabled in Project Loom: the assumption doesn't hold for continuations (in their current form).
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://openjdk.java.net/jeps/370
>    JEP 370: Foreign-Memory Access API (Incubator)


From john.r.rose at oracle.com  Sat Dec  7 02:09:24 2019
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 6 Dec 2019 18:09:24 -0800
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
Message-ID: <86F63579-C1DB-469D-8E8F-E8017C28D342@oracle.com>

On Nov 29, 2019, at 7:42 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ <http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/>
> https://bugs.openjdk.java.net/browse/JDK-8226411 <https://bugs.openjdk.java.net/browse/JDK-8226411>

That looks good to me.

I wonder if there is a similar opportunity in LibraryCallKit::inline_unsafe_load_store
or inline_vector_mem_operation or inline_unsafe_copyMemory.  All of those
also form unsafe addresses, and at least some seem to mention NULL_PTR or IN_HEAP.
Is it worth an explanatory comment or a tracking bug?

? John

From navy.xliu at gmail.com  Sat Dec  7 04:26:16 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Fri, 6 Dec 2019 20:26:16 -0800
Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction
 -XX:+Verbose
In-Reply-To: <7176592d-4784-0797-4457-463773464536@oracle.com>
References: <CAHPdc0m+t9yOZ8YVxbCiRANxaGKO89UL7o-24FmHHQDwCViXNQ@mail.gmail.com>
 <7176592d-4784-0797-4457-463773464536@oracle.com>
Message-ID: <CAHPdc0m5o6jp+t5xFxCM5Y_KG8rQiubOXU_FE6mt7-+f46fE2w@mail.gmail.com>

hi, Tobias,

Thank you for reviewing it. I add a regression test about it. Could you
take a look?
https://cr.openjdk.java.net/~xliu/8235383/01/webrev/

thanks,

--lx


On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann <tobias.hartmann at oracle.com>
wrote:

> Hi Liu,
>
> your fix looks good to me but could you please add a regression test?
>
> Thanks,
> Tobias
>
> On 06.12.19 08:23, Liu Xin wrote:
> > Hi, Reviewers,
> >
> > Could you review this very simple bugfix for C1?
> > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383
> > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/
> >
> > The root cause is some instructions are going to be eliminated, so they
> are
> > not assigned to any valid bci.
> > In present of  -XX:+PrintIRDuringConstruction -XX:+Verbose,  C1 will
> print
> > them out and then hit the assert.
> >
> > Yes, I can twiddle graph_builder to assign right BCIs to them,  but I
> would
> > like to have a more robust InstructionPrinter::print_line. the CR will
> > leave blanks in the position of bci.
> > Eliminated store for object 0:
> > .      0    a67    a58._24 := a54 (L) next
> > Eliminated load:
> > .      0    i35    a11._24 (I) position
> >
> > thanks,
> > --lx
> >
>

From claes.redestad at oracle.com  Sun Dec  8 21:44:53 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Sun, 8 Dec 2019 22:44:53 +0100
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
Message-ID: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>

Hi,

increasing MaxInlineLevel can substantially improve performance in some
benchmarks[1], and has been reported to help applications implemented in
scala in particular.

There is always some risk of regressions when tweaking the default
inlining settings. I've done a number of experiments to ascertain that
the effect of increasing this on a wide array of benchmarks. With 15 all
benchmarks tested are show either neutral or positive results, with no
observed regression w.r.t. compilation speed or code cache usage.

Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/

Thanks!

/Claes

[1] One http://renaissance.dev sub-benchmark improve by almost 3x with
an increase from 9 to 15.

From navy.xliu at gmail.com  Sun Dec  8 23:23:21 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Sun, 8 Dec 2019 15:23:21 -0800
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
Message-ID: <CAHPdc0=9hCe8CMyyTzjxFmAEefC70NfVk1xYDwnDwE6i1MUHLQ@mail.gmail.com>

Hi, Claes,

If you increase MaxInlineLevel from 9 to 15, don't you need adjust initial
and maximal size of CodeCache?
Could you elaborate 3x sub-benchmark? Is it java or scala?

thanks,
--lx


On Sun, Dec 8, 2019 at 1:42 PM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi,
>
> increasing MaxInlineLevel can substantially improve performance in some
> benchmarks[1], and has been reported to help applications implemented in
> scala in particular.
>
> There is always some risk of regressions when tweaking the default
> inlining settings. I've done a number of experiments to ascertain that
> the effect of increasing this on a wide array of benchmarks. With 15 all
> benchmarks tested are show either neutral or positive results, with no
> observed regression w.r.t. compilation speed or code cache usage.
>
> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
>
> Thanks!
>
> /Claes
>
> [1] One http://renaissance.dev sub-benchmark improve by almost 3x with
> an increase from 9 to 15.
>

From claes.redestad at oracle.com  Mon Dec  9 00:22:04 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 9 Dec 2019 01:22:04 +0100
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <CAHPdc0=9hCe8CMyyTzjxFmAEefC70NfVk1xYDwnDwE6i1MUHLQ@mail.gmail.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAHPdc0=9hCe8CMyyTzjxFmAEefC70NfVk1xYDwnDwE6i1MUHLQ@mail.gmail.com>
Message-ID: <52e0eae4-1df4-bcee-af59-c889d0d25f53@oracle.com>

Hi lx,

On 2019-12-09 00:23, Liu Xin wrote:
> Hi, Claes,
> 
> If you increase MaxInlineLevel from 9 to 15, don't you need adjust 
> initial and maximal size of CodeCache?

Not necessarily: yes, more aggressive inlining may lead to larger
nmethods and less sharing overall, which *could* mean we need more code
cache space over time. But it may also mean more aggressive dead code 
elimination, which can mean a net reduction in code cache utilization
for some applications.

For the applications and benchmarks we've tested, code cache utilization
is largely the same. And while we can't rule out that some applications
may see regressions, data strongly suggests a more aggressive setting
will be better on average for most.

> Could you elaborate?3x sub-benchmark? Is it java or scala?

IIRC scala-kmeans was the sub-benchmark that saw the largest
improvement.

/Claes

From vladimir.kozlov at oracle.com  Mon Dec  9 00:52:42 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Sun, 8 Dec 2019 16:52:42 -0800
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
Message-ID: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com>

Nice finding! Good.

Thanks,
Vladimir

On 12/8/19 1:44 PM, Claes Redestad wrote:
> Hi,
> 
> increasing MaxInlineLevel can substantially improve performance in some
> benchmarks[1], and has been reported to help applications implemented in
> scala in particular.
> 
> There is always some risk of regressions when tweaking the default
> inlining settings. I've done a number of experiments to ascertain that
> the effect of increasing this on a wide array of benchmarks. With 15 all
> benchmarks tested are show either neutral or positive results, with no
> observed regression w.r.t. compilation speed or code cache usage.
> 
> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
> 
> Thanks!
> 
> /Claes
> 
> [1] One http://renaissance.dev sub-benchmark improve by almost 3x with
> an increase from 9 to 15.

From navy.xliu at gmail.com  Mon Dec  9 05:19:19 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Sun, 8 Dec 2019 21:19:19 -0800
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <52e0eae4-1df4-bcee-af59-c889d0d25f53@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAHPdc0=9hCe8CMyyTzjxFmAEefC70NfVk1xYDwnDwE6i1MUHLQ@mail.gmail.com>
 <52e0eae4-1df4-bcee-af59-c889d0d25f53@oracle.com>
Message-ID: <CAHPdc0knonBNA7wwyuWiHcp3zoByxvNvyvJ6jM9SwpngnLdLXQ@mail.gmail.com>

hi, Claes,
Glad to know that bigger scope for optimizations can offset the bigger
methods. Thank you for the clarification.

thanks
--lx


On Sun, Dec 8, 2019 at 4:19 PM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi lx,
>
> On 2019-12-09 00:23, Liu Xin wrote:
> > Hi, Claes,
> >
> > If you increase MaxInlineLevel from 9 to 15, don't you need adjust
> > initial and maximal size of CodeCache?
>
> Not necessarily: yes, more aggressive inlining may lead to larger
> nmethods and less sharing overall, which *could* mean we need more code
> cache space over time. But it may also mean more aggressive dead code
> elimination, which can mean a net reduction in code cache utilization
> for some applications.
>
> For the applications and benchmarks we've tested, code cache utilization
> is largely the same. And while we can't rule out that some applications
> may see regressions, data strongly suggests a more aggressive setting
> will be better on average for most.
>
> > Could you elaborate 3x sub-benchmark? Is it java or scala?
>
> IIRC scala-kmeans was the sub-benchmark that saw the largest
> improvement.
>
> /Claes
>

From tobias.hartmann at oracle.com  Mon Dec  9 07:15:22 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 9 Dec 2019 08:15:22 +0100
Subject: RFR(XXS): C1 compilation fails with
 -XX:+PrintIRDuringConstruction -XX:+Verbose
In-Reply-To: <CAHPdc0m5o6jp+t5xFxCM5Y_KG8rQiubOXU_FE6mt7-+f46fE2w@mail.gmail.com>
References: <CAHPdc0m+t9yOZ8YVxbCiRANxaGKO89UL7o-24FmHHQDwCViXNQ@mail.gmail.com>
 <7176592d-4784-0797-4457-463773464536@oracle.com>
 <CAHPdc0m5o6jp+t5xFxCM5Y_KG8rQiubOXU_FE6mt7-+f46fE2w@mail.gmail.com>
Message-ID: <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com>

Hi Liu,

thanks for adding the test.

We try to avoid bug ids as test names. In this case, I would suggest something like
TestPrintIRDuringConstruction with a compiler.c1 package declaration. Also, the first line of the
copyright header looks wrong (should look like this [1]).

line 27-29: I don't think you need these lines
line 42: "initiail" -> "initial"

Thanks,
Tobias

[1]
http://hg.openjdk.java.net/jdk/jdk/raw-file/fb39a8d1d101/test/hotspot/jtreg/compiler/c1/TestGotoIfMain.java

On 07.12.19 05:26, Liu Xin wrote:
> hi, Tobias,?
> 
> Thank you for reviewing it. I add a regression test about it. Could you take a look?
> https://cr.openjdk.java.net/~xliu/8235383/01/webrev/
> 
> thanks, 
> 
> --lx
> 
> 
> 
> On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann <tobias.hartmann at oracle.com
> <mailto:tobias.hartmann at oracle.com>> wrote:
> 
>     Hi Liu,
> 
>     your fix looks good to me but could you please add a regression test?
> 
>     Thanks,
>     Tobias
> 
>     On 06.12.19 08:23, Liu Xin wrote:
>     > Hi, Reviewers,
>     >
>     > Could you review this very simple bugfix for C1?
>     > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383
>     > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/
>     >
>     > The root cause is some instructions are going to be eliminated, so they are
>     > not assigned to any valid bci.
>     > In present of? -XX:+PrintIRDuringConstruction -XX:+Verbose,? C1 will print
>     > them out and then hit the assert.
>     >
>     > Yes, I can twiddle graph_builder to assign right BCIs to them,? but I would
>     > like to have a more robust InstructionPrinter::print_line. the CR will
>     > leave blanks in the position of bci.
>     > Eliminated store for object 0:
>     > .? ? ? 0? ? a67? ? a58._24 := a54 (L) next
>     > Eliminated load:
>     > .? ? ? 0? ? i35? ? a11._24 (I) position
>     >
>     > thanks,
>     > --lx
>     >
> 

From navy.xliu at gmail.com  Mon Dec  9 08:17:41 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Mon, 9 Dec 2019 00:17:41 -0800
Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction
 -XX:+Verbose
In-Reply-To: <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com>
References: <CAHPdc0m+t9yOZ8YVxbCiRANxaGKO89UL7o-24FmHHQDwCViXNQ@mail.gmail.com>
 <7176592d-4784-0797-4457-463773464536@oracle.com>
 <CAHPdc0m5o6jp+t5xFxCM5Y_KG8rQiubOXU_FE6mt7-+f46fE2w@mail.gmail.com>
 <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com>
Message-ID: <CAHPdc0=1J6s0QUjkk0sYj+qGsyUcTfBHHyOU3w=9aWe=yO2hoA@mail.gmail.com>

Hi, Tobias,

Thanks for your feedback.
Here is the new webrev.  I update what you pointed out.
https://cr.openjdk.java.net/~xliu/8235383/02/webrev/

The patch passed hotspot-tier1 for both fastdebug and release builds.

thanks,
--lx


On Sun, Dec 8, 2019 at 11:15 PM Tobias Hartmann <tobias.hartmann at oracle.com>
wrote:

> Hi Liu,
>
> thanks for adding the test.
>
> We try to avoid bug ids as test names. In this case, I would suggest
> something like
> TestPrintIRDuringConstruction with a compiler.c1 package declaration.
> Also, the first line of the
> copyright header looks wrong (should look like this [1]).
>
> line 27-29: I don't think you need these lines
> line 42: "initiail" -> "initial"
>
> Thanks,
> Tobias
>
> [1]
>
> http://hg.openjdk.java.net/jdk/jdk/raw-file/fb39a8d1d101/test/hotspot/jtreg/compiler/c1/TestGotoIfMain.java
>
> On 07.12.19 05:26, Liu Xin wrote:
> > hi, Tobias,
> >
> > Thank you for reviewing it. I add a regression test about it. Could you
> take a look?
> > https://cr.openjdk.java.net/~xliu/8235383/01/webrev/
> >
> > thanks,
> >
> > --lx
> >
> >
> >
> > On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann <
> tobias.hartmann at oracle.com
> > <mailto:tobias.hartmann at oracle.com>> wrote:
> >
> >     Hi Liu,
> >
> >     your fix looks good to me but could you please add a regression test?
> >
> >     Thanks,
> >     Tobias
> >
> >     On 06.12.19 08:23, Liu Xin wrote:
> >     > Hi, Reviewers,
> >     >
> >     > Could you review this very simple bugfix for C1?
> >     > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383
> >     > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/
> >     >
> >     > The root cause is some instructions are going to be eliminated, so
> they are
> >     > not assigned to any valid bci.
> >     > In present of  -XX:+PrintIRDuringConstruction -XX:+Verbose,  C1
> will print
> >     > them out and then hit the assert.
> >     >
> >     > Yes, I can twiddle graph_builder to assign right BCIs to them,
> but I would
> >     > like to have a more robust InstructionPrinter::print_line. the CR
> will
> >     > leave blanks in the position of bci.
> >     > Eliminated store for object 0:
> >     > .      0    a67    a58._24 := a54 (L) next
> >     > Eliminated load:
> >     > .      0    i35    a11._24 (I) position
> >     >
> >     > thanks,
> >     > --lx
> >     >
> >
>

From rahul.v.raghavan at oracle.com  Mon Dec  9 08:38:03 2019
From: rahul.v.raghavan at oracle.com (Rahul Raghavan)
Date: Mon, 9 Dec 2019 14:08:03 +0530
Subject: [14]RFR: 8233453: MLVM deoptimize stress test timed out
In-Reply-To: <6a887d8a-8843-bd21-857d-1d41531738aa@oracle.com>
References: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com>
 <2B01486C-0BBC-48DB-BBA0-D2CD67D5A1DD@oracle.com>
 <6a887d8a-8843-bd21-857d-1d41531738aa@oracle.com>
Message-ID: <bb6cdbc2-93c7-d442-48a1-57b940ba585a@oracle.com>

Thank Igor, Tobias.

On 06/12/19 10:53 pm, Tobias Hartmann wrote:
> +1
> 
> Best regards,
> Tobias
> 
> On 06.12.19 18:12, Igor Ignatyev wrote:
>> Hi Rahul,
>>
>> the fix sound reasonable to me.
>>
>> Thanks,
>> Igor
>>
>>> On Dec 6, 2019, at 6:20 AM, Rahul Raghavan <rahul.v.raghavan at oracle.com> wrote:
>>>
>>> Hi,
>>>
>>> Please review the following fix changeset.
>>>
>>> <webrev> - http://cr.openjdk.java.net/~rraghavan/8233453/webrev.00/
>>>
>>> # https://bugs.openjdk.java.net/browse/JDK-8233453
>>> # test-[open/test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java]
>>>
>>> Issue is  intermittent timeout failures of the test;
>>> mostly with solaris-sparc and sometimes with windows-x64.
>>>
>>> The proposed fix here is to increase the timeout factor for the test.
>>> [test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java]
>>> - * @run main/othervm
>>> + * @run main/othervm/timeout=300
>>> .......
>>> - * @run main/othervm
>>> + * @run main/othervm/timeout=300
>>>
>>> For some timeout failure cases of the test run, logs got generated with extra thread dump info.
>>>   e.g.: one attached to JBS page -
>>>   https://bugs.openjdk.java.net/secure/attachment/85805/Test_id0--JDK13.jtr
>>> But could not find any hint of blocking threads or deadlocks from the analysis of these thread dumps.
>>> Also instead it seems as if the timeout factor for the test is less
>>> and that test just need more time to execute.
>>> Got no failures with repeat test run after above proposed fix, also supports this.
>>>
>>>
>>> Also checking the comments, fix done for old -
>>> - https://bugs.openjdk.java.net/browse/JDK-8212028
>>> (Use run-test makefile framework for testing in Oracle's Mach5)
>>> - http://hg.openjdk.java.net/jdk/jdk/rev/28375a1de254
>>> Found that similar '/timeout=300' addition was done for some '../vmTestbase/vm/mlvm/..' tests.
>>>
>>>
>>> It seems the intermittent timeouts for the test started with jdk-13+17, with fix changeset of -
>>>   https://bugs.openjdk.java.net/browse/JDK-8221393
>>>   ResolvedMethodTable too small for StackWalking applications
>>> (could not get any timeout failure for the test with sources updated to jdk-13+16 or updated to changeset version just before 8221393 fix.)
>>> Opened a separate task - JDK-8235485 - to check if this slowdown is expected.
>>>
>>>
>>> Not getting any timeout failure, for repeat test run with above fix proposal.
>>>
>>>
>>> Thanks,
>>> Rahul
>>

From claes.redestad at oracle.com  Mon Dec  9 10:15:04 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 9 Dec 2019 11:15:04 +0100
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com>
Message-ID: <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com>

Thanks for the review!

/Claes

On 2019-12-09 01:52, Vladimir Kozlov wrote:
> Nice finding! Good.
> 
> Thanks,
> Vladimir
> 
> On 12/8/19 1:44 PM, Claes Redestad wrote:
>> Hi,
>>
>> increasing MaxInlineLevel can substantially improve performance in some
>> benchmarks[1], and has been reported to help applications implemented in
>> scala in particular.
>>
>> There is always some risk of regressions when tweaking the default
>> inlining settings. I've done a number of experiments to ascertain that
>> the effect of increasing this on a wide array of benchmarks. With 15 all
>> benchmarks tested are show either neutral or positive results, with no
>> observed regression w.r.t. compilation speed or code cache usage.
>>
>> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
>>
>> Thanks!
>>
>> /Claes
>>
>> [1] One http://renaissance.dev sub-benchmark improve by almost 3x with
>> an increase from 9 to 15.

From martin.doerr at sap.com  Mon Dec  9 11:16:30 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 9 Dec 2019 11:16:30 +0000
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com>
 <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com>
Message-ID: <AM0PR0202MB3297F9E8133691AE9D6B31469A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi,

I think tuning inlining makes sense for C2.

The problem I see is that C1 uses the same inlining flags.
C1 doesn't have the concept of uncommon traps so it compiles all paths.
That's why I think this change is only good for C2.

So I suggest separating C1 inlining flags before tuning them for C2 like in the example below (note that new flags require CSR).
Feedback for this idea is welcome.

Best regards,
Martin


diff -r 45fceff98bb5 src/hotspot/share/c1/c1_globals.hpp
--- a/src/hotspot/share/c1/c1_globals.hpp       Mon Dec 09 10:26:41 2019 +0100
+++ b/src/hotspot/share/c1/c1_globals.hpp       Mon Dec 09 12:08:37 2019 +0100
@@ -174,6 +174,12 @@
   develop_pd(bool, RoundFPResults,                                          \
           "Indicates whether rounding is needed for floating point results")\
                                                                             \
+  product(intx, C1MaxInlineSize, 35,                                        \
+          "The maximum bytecode size of a method to be inlined by C1")      \
+  product(intx, C1MaxInlineLevel, 9,                                        \
+          "The maximum number of nested calls that are inlined by C1")      \
+  product(intx, C1MaxRecursiveInlineLevel, 1,                               \
+          "maximum number of nested recursive calls that are inlined")      \
   develop(intx, NestedInliningSizeRatio, 90,                                \
           "Percentage of prev. allowed inline size in recursive inlining")  \
           range(0, 100)                                                     \


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Claes Redestad
> Sent: Montag, 9. Dezember 2019 11:15
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel
> 
> Thanks for the review!
> 
> /Claes
> 
> On 2019-12-09 01:52, Vladimir Kozlov wrote:
> > Nice finding! Good.
> >
> > Thanks,
> > Vladimir
> >
> > On 12/8/19 1:44 PM, Claes Redestad wrote:
> >> Hi,
> >>
> >> increasing MaxInlineLevel can substantially improve performance in some
> >> benchmarks[1], and has been reported to help applications implemented
> in
> >> scala in particular.
> >>
> >> There is always some risk of regressions when tweaking the default
> >> inlining settings. I've done a number of experiments to ascertain that
> >> the effect of increasing this on a wide array of benchmarks. With 15 all
> >> benchmarks tested are show either neutral or positive results, with no
> >> observed regression w.r.t. compilation speed or code cache usage.
> >>
> >> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
> >>
> >> Thanks!
> >>
> >> /Claes
> >>
> >> [1] One http://renaissance.dev sub-benchmark improve by almost 3x
> with
> >> an increase from 9 to 15.

From claes.redestad at oracle.com  Mon Dec  9 12:44:19 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 9 Dec 2019 13:44:19 +0100
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <AM0PR0202MB3297F9E8133691AE9D6B31469A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com>
 <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com>
 <AM0PR0202MB3297F9E8133691AE9D6B31469A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com>

Hi,

Nils raised this issue in the bug, and while I think it's a fair point I
think it's orthogonal to whether or not we can/should tune the default
here and now. I think the data we have speaks in favor of doing this
tuning for JDK 14, and then re-evaluate with more real-world data for
JDK 15, possibly dialing back the C1 defaults.

When implementing such flags we can also evaluate if C1 should be even
more conservative than it is today. It's also worth thinking about
whether or not we should introduce different settings for C1 level 1, 2
and 3..

/Claes

On 2019-12-09 12:16, Doerr, Martin wrote:
> Hi,
> 
> I think tuning inlining makes sense for C2.
> 
> The problem I see is that C1 uses the same inlining flags.
> C1 doesn't have the concept of uncommon traps so it compiles all paths.
> That's why I think this change is only good for C2.
> 
> So I suggest separating C1 inlining flags before tuning them for C2 like in the example below (note that new flags require CSR).
> Feedback for this idea is welcome.
> 
> Best regards,
> Martin
> 
> 
> 
> diff -r 45fceff98bb5 src/hotspot/share/c1/c1_globals.hpp
> --- a/src/hotspot/share/c1/c1_globals.hpp       Mon Dec 09 10:26:41 2019 +0100
> +++ b/src/hotspot/share/c1/c1_globals.hpp       Mon Dec 09 12:08:37 2019 +0100
> @@ -174,6 +174,12 @@
>     develop_pd(bool, RoundFPResults,                                          \
>             "Indicates whether rounding is needed for floating point results")\
>                                                                               \
> +  product(intx, C1MaxInlineSize, 35,                                        \
> +          "The maximum bytecode size of a method to be inlined by C1")      \
> +  product(intx, C1MaxInlineLevel, 9,                                        \
> +          "The maximum number of nested calls that are inlined by C1")      \
> +  product(intx, C1MaxRecursiveInlineLevel, 1,                               \
> +          "maximum number of nested recursive calls that are inlined")      \
>     develop(intx, NestedInliningSizeRatio, 90,                                \
>             "Percentage of prev. allowed inline size in recursive inlining")  \
>             range(0, 100)                                                     \
> 
> 
> 
> 
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> bounces at openjdk.java.net> On Behalf Of Claes Redestad
>> Sent: Montag, 9. Dezember 2019 11:15
>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel
>>
>> Thanks for the review!
>>
>> /Claes
>>
>> On 2019-12-09 01:52, Vladimir Kozlov wrote:
>>> Nice finding! Good.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 12/8/19 1:44 PM, Claes Redestad wrote:
>>>> Hi,
>>>>
>>>> increasing MaxInlineLevel can substantially improve performance in some
>>>> benchmarks[1], and has been reported to help applications implemented
>> in
>>>> scala in particular.
>>>>
>>>> There is always some risk of regressions when tweaking the default
>>>> inlining settings. I've done a number of experiments to ascertain that
>>>> the effect of increasing this on a wide array of benchmarks. With 15 all
>>>> benchmarks tested are show either neutral or positive results, with no
>>>> observed regression w.r.t. compilation speed or code cache usage.
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>>
>>>> [1] One http://renaissance.dev sub-benchmark improve by almost 3x
>> with
>>>> an increase from 9 to 15.

From gromero at linux.vnet.ibm.com  Mon Dec  9 13:10:29 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 9 Dec 2019 10:10:29 -0300
Subject: RFR(XS): 8223968: Add abort type description to RTM statistic counters
Message-ID: <d942d34d-4ef2-a30f-c688-a37ad624515a@linux.vnet.ibm.com>

Hi,

Could the following change be reviewed please?

Bug   : https://bugs.openjdk.java.net/browse/JDK-8223968
Webrev: http://cr.openjdk.java.net/~gromero/8223968/v1/

It simply adds a description to the RTM abort counters, helping to understand
the RTM statistics output faster and better.

The change touches RTM shared code and also RTM tests in order to adapt the
regex to the new output format.

Thanks a lot.

Best regards,
Gustavo

From tobias.hartmann at oracle.com  Mon Dec  9 13:59:23 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 9 Dec 2019 14:59:23 +0100
Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with
 assert(is_OuterStripMinedLoop()) failed: invalid node class
Message-ID: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8235452
http://cr.openjdk.java.net/~thartmann/8235452/webrev.00/

We crash during loop verification because a strip mined loop lost its OuterStripMined loop
counterpart. Removal of the outer loop is triggered by the new optimization added with JDK-8220376
[1] but it can probably happen in other circumstances as well. The strip mined loop also lost its
CountedLoopEnd and is therefore malformed.

The detailed steps of how this happens are described in the test:
http://cr.openjdk.java.net/~thartmann/8235452/webrev.00/test/hotspot/jtreg/compiler/loopstripmining/TestDeadOuterStripMinedLoop.java.html

The fix is to not try to verify strip mining if the strip mined loop is malformed. I've added an
assert to check that the OuterStripMinedLoop is always removed in this case.

The test also triggers a crash in the UseProfiledLoopPredicate code added by JDK-8203197 [2]. I've
slightly modified the code in loopPredicate.cpp to only execute PathFrequency::to if the projection
matches the uncommon trap if pattern and filed JDK-8235584 [3] to investigate in detail.

Thanks,
Tobias

[1] https://bugs.openjdk.java.net/browse/JDK-8220376
[2] https://bugs.openjdk.java.net/browse/JDK-8203197
[3] https://bugs.openjdk.java.net/browse/JDK-8235584

From rwestrel at redhat.com  Mon Dec  9 14:05:42 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 09 Dec 2019 15:05:42 +0100
Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with
 assert(is_OuterStripMinedLoop()) failed: invalid node class
In-Reply-To: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com>
References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com>
Message-ID: <8736dtbnm1.fsf@redhat.com>


> http://cr.openjdk.java.net/~thartmann/8235452/webrev.00/

For OuterStripMinedLoopNode, is_valid_counted_loop() is false so the
verification never runs when called from a OuterStripMinedLoopNode and
the new line you added:

 958   } else if (is_OuterStripMinedLoop()) {
 959     outer = this->as_OuterStripMinedLoop();
 960     inner = outer->unique_ctrl_out()->as_CountedLoop();
 961     assert(inner->is_valid_counted_loop(), "OuterStripMinedLoop should have been removed");
 962     assert(!is_strip_mined(), "outer loop shouldn't be marked strip mined");
 963   }

is unreachable.

Roland.


From martin.doerr at sap.com  Mon Dec  9 14:06:32 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 9 Dec 2019 14:06:32 +0000
Subject: RFR(XS): 8223968: Add abort type description to RTM statistic
 counters
In-Reply-To: <d942d34d-4ef2-a30f-c688-a37ad624515a@linux.vnet.ibm.com>
References: <d942d34d-4ef2-a30f-c688-a37ad624515a@linux.vnet.ibm.com>
Message-ID: <AM0PR0202MB3297C7F8E5CEBAE755DF83589A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi Gustavo,

nice improvement!

I'd only make the message array static:
static const char* _abortX_desc[ABORT_STATUS_LIMIT];

+const char* RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = {
+  "abort instruction   ",
+  "may succeed on retry",
+  "thread conflict     ",
+  "buffer overflow     ",
+  "debug or trap hit   ",
+  "maximum nested depth"
+};

Please update the copyright in the test.

Best regards,
Martin


> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Montag, 9. Dezember 2019 14:10
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: Doerr, Martin <martin.doerr at sap.com>; vladimir.kozlov at oracle.com
> Subject: RFR(XS): 8223968: Add abort type description to RTM statistic
> counters
> 
> Hi,
> 
> Could the following change be reviewed please?
> 
> Bug   : https://bugs.openjdk.java.net/browse/JDK-8223968
> Webrev: http://cr.openjdk.java.net/~gromero/8223968/v1/
> 
> It simply adds a description to the RTM abort counters, helping to understand
> the RTM statistics output faster and better.
> 
> The change touches RTM shared code and also RTM tests in order to adapt
> the
> regex to the new output format.
> 
> Thanks a lot.
> 
> Best regards,
> Gustavo

From tobias.hartmann at oracle.com  Mon Dec  9 14:13:08 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 9 Dec 2019 15:13:08 +0100
Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with
 assert(is_OuterStripMinedLoop()) failed: invalid node class
In-Reply-To: <8736dtbnm1.fsf@redhat.com>
References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com>
 <8736dtbnm1.fsf@redhat.com>
Message-ID: <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com>

Hi Roland,

thanks for looking at this!

On 09.12.19 15:05, Roland Westrelin wrote:
> For OuterStripMinedLoopNode, is_valid_counted_loop() is false so the
> verification never runs when called from a OuterStripMinedLoopNode and
> the new line you added:
> 
>  958   } else if (is_OuterStripMinedLoop()) {
>  959     outer = this->as_OuterStripMinedLoop();
>  960     inner = outer->unique_ctrl_out()->as_CountedLoop();
>  961     assert(inner->is_valid_counted_loop(), "OuterStripMinedLoop should have been removed");
>  962     assert(!is_strip_mined(), "outer loop shouldn't be marked strip mined");
>  963   }
> 
> is unreachable.

Right. What about this?
http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/

Thanks,
Tobias

From rwestrel at redhat.com  Mon Dec  9 14:18:27 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 09 Dec 2019 15:18:27 +0100
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <VI1PR0202MB3312B7BD20D3FC5DAD7713949A420@VI1PR0202MB3312.eurprd02.prod.outlook.com>
References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com>
 <VI1PR0202MB3312B7BD20D3FC5DAD7713949A420@VI1PR0202MB3312.eurprd02.prod.outlook.com>
Message-ID: <87wob5a8gc.fsf@redhat.com>


Hi Martin,

Thanks for reviewing this.

> 1. mode != IgnoreStripMined at the beginning
> 2. (mode == ControlAroundStripMined && use == sfpt) ||!use->is_reachable_from_root()

That's reasonable. I will make the suggested changes.

Roland.


From rwestrel at redhat.com  Mon Dec  9 14:19:51 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 09 Dec 2019 15:19:51 +0100
Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with
 assert(is_OuterStripMinedLoop()) failed: invalid node class
In-Reply-To: <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com>
References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com>
 <8736dtbnm1.fsf@redhat.com> <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com>
Message-ID: <87tv69a8e0.fsf@redhat.com>


> http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/

Yes, good.

Roland.


From tobias.hartmann at oracle.com  Mon Dec  9 14:30:32 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 9 Dec 2019 15:30:32 +0100
Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with
 assert(is_OuterStripMinedLoop()) failed: invalid node class
In-Reply-To: <87tv69a8e0.fsf@redhat.com>
References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com>
 <8736dtbnm1.fsf@redhat.com> <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com>
 <87tv69a8e0.fsf@redhat.com>
Message-ID: <b1f77a4b-ea5c-c72b-0d43-ae314d614b77@oracle.com>

Thanks Roland.

Best regards,
Tobias

On 09.12.19 15:19, Roland Westrelin wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/
> 
> Yes, good.
> 
> Roland.
> 

From rwestrel at redhat.com  Mon Dec  9 15:15:59 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 09 Dec 2019 16:15:59 +0100
Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be
In-Reply-To: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com>
References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com>
Message-ID: <87k175a5sg.fsf@redhat.com>


> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/

That looks reasonable to me.

Roland.


From nils.eliasson at oracle.com  Mon Dec  9 15:20:46 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 9 Dec 2019 16:20:46 +0100
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com>
 <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com>
 <AM0PR0202MB3297F9E8133691AE9D6B31469A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com>
Message-ID: <f8466de3-8404-7f09-70bc-427bd9b6bf11@oracle.com>

Hi,

For the record - I agree with Claes point. There are a lot of other 
things that will be nice to evaluate in a later release, but given that 
the data for this change looks good - I think we show go ahead.

Reviewed,

Nils


On 2019-12-09 13:44, Claes Redestad wrote:
> Hi,
>
> Nils raised this issue in the bug, and while I think it's a fair point I
> think it's orthogonal to whether or not we can/should tune the default
> here and now. I think the data we have speaks in favor of doing this
> tuning for JDK 14, and then re-evaluate with more real-world data for
> JDK 15, possibly dialing back the C1 defaults.
>
> When implementing such flags we can also evaluate if C1 should be even
> more conservative than it is today. It's also worth thinking about
> whether or not we should introduce different settings for C1 level 1, 2
> and 3..
>
> /Claes
>
> On 2019-12-09 12:16, Doerr, Martin wrote:
>> Hi,
>>
>> I think tuning inlining makes sense for C2.
>>
>> The problem I see is that C1 uses the same inlining flags.
>> C1 doesn't have the concept of uncommon traps so it compiles all paths.
>> That's why I think this change is only good for C2.
>>
>> So I suggest separating C1 inlining flags before tuning them for C2 
>> like in the example below (note that new flags require CSR).
>> Feedback for this idea is welcome.
>>
>> Best regards,
>> Martin
>>
>>
>>
>> diff -r 45fceff98bb5 src/hotspot/share/c1/c1_globals.hpp
>> --- a/src/hotspot/share/c1/c1_globals.hpp?????? Mon Dec 09 10:26:41 
>> 2019 +0100
>> +++ b/src/hotspot/share/c1/c1_globals.hpp?????? Mon Dec 09 12:08:37 
>> 2019 +0100
>> @@ -174,6 +174,12 @@
>> ??? develop_pd(bool, 
>> RoundFPResults,????????????????????????????????????????? \
>> ??????????? "Indicates whether rounding is needed for floating point 
>> results")\
>> \
>> +? product(intx, C1MaxInlineSize, 
>> 35,??????????????????????????????????????? \
>> +????????? "The maximum bytecode size of a method to be inlined by 
>> C1")????? \
>> +? product(intx, C1MaxInlineLevel, 
>> 9,??????????????????????????????????????? \
>> +????????? "The maximum number of nested calls that are inlined by 
>> C1")????? \
>> +? product(intx, C1MaxRecursiveInlineLevel, 
>> 1,?????????????????????????????? \
>> +????????? "maximum number of nested recursive calls that are 
>> inlined")????? \
>> ??? develop(intx, NestedInliningSizeRatio, 
>> 90,??????????????????????????????? \
>> ??????????? "Percentage of prev. allowed inline size in recursive 
>> inlining")? \
>> ??????????? range(0, 
>> 100)???????????????????????????????????????????????????? \
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>>> bounces at openjdk.java.net> On Behalf Of Claes Redestad
>>> Sent: Montag, 9. Dezember 2019 11:15
>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel
>>>
>>> Thanks for the review!
>>>
>>> /Claes
>>>
>>> On 2019-12-09 01:52, Vladimir Kozlov wrote:
>>>> Nice finding! Good.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 12/8/19 1:44 PM, Claes Redestad wrote:
>>>>> Hi,
>>>>>
>>>>> increasing MaxInlineLevel can substantially improve performance in 
>>>>> some
>>>>> benchmarks[1], and has been reported to help applications implemented
>>> in
>>>>> scala in particular.
>>>>>
>>>>> There is always some risk of regressions when tweaking the default
>>>>> inlining settings. I've done a number of experiments to ascertain 
>>>>> that
>>>>> the effect of increasing this on a wide array of benchmarks. With 
>>>>> 15 all
>>>>> benchmarks tested are show either neutral or positive results, 
>>>>> with no
>>>>> observed regression w.r.t. compilation speed or code cache usage.
>>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
>>>>>
>>>>> Thanks!
>>>>>
>>>>> /Claes
>>>>>
>>>>> [1] One http://renaissance.dev sub-benchmark improve by almost 3x
>>> with
>>>>> an increase from 9 to 15.

From rwestrel at redhat.com  Mon Dec  9 15:44:22 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 09 Dec 2019 16:44:22 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
Message-ID: <87h829a4h5.fsf@redhat.com>


Hi Christian,

> Before loading and copying the extra data from the MDO to the ciMDO in 
> ciMethodData::load_extra_data(), the metadata is prepared in a 
> fixed-point iteration by cleaning all SpeculativeTrapData entries of 
> methods whose klasses are unloaded [3]. If it encounters such a dead 
> entry it releases the extra data lock (due to ranking issues) and tries 
> again later [4]. This release of the lock triggers the bug: There can be 
> cases where one thread A is waiting in the whitebox API method to get 
> the extra data lock [2] to clean the extra data for the very same MDO 
> for which another thread B just released the lock at [4]. If that MDO 
> actually contained SpeculativeTrapData entries, then thread A cleaned 
> those but the ciMDO, which thread B is preparing, still contains the 
> uncleaned old MDO extra data (because thread B only made a snapshot of 
> the MDO earlier at [5]).

Would it be possible to call prepare_data() before the snapshot is taken
so the snapshot doesn't contain any entry that are then removed?

Roland.


From erik.osterlund at oracle.com  Mon Dec  9 15:59:39 2019
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 9 Dec 2019 16:59:39 +0100
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com>
References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com>
 <BC1A1F43-8B28-4635-899A-3936251746EB@oracle.com>
 <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com>
Message-ID: <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com>

Hi Vladimir,

On 2019-12-06 16:19, Vladimir Ivanov wrote:
> Hi Erik,
>
> I like your idea. Here's updated version:
> ? http://cr.openjdk.java.net/~vlivanov/8226411/webrev.01

Looks good!

> While browsing the code, I noticed that changes in 
> G1BarrierSetC2::load_at_resolved() aren't required (need_cpu_mem_bar 
> is used for oop case). But I decided to keep them to keep it 
> (relatively) close to C2Access::needs_cpu_membar().

Sounds reasonable.

Thanks,
/Erik

> Best regards,
> Vladimir Ivanov
>
> On 05.12.2019 23:14, Erik ?sterlund wrote:
>> Hi,
>>
>> Could we use the existing IN_NATIVE decorator instead of introducing 
>> a new decorator that seems to be an alias for the same thing? The 
>> decorator describing its use (IN_NATIVE) says it is for off-heap 
>> accesses pointing into the heap. We can just remove from the comment 
>> the part presuming it is a reference.
>>
>> What do you think?
>>
>> Thanks,
>> /Erik
>>
>>> On 5 Dec 2019, at 20:16, Vladimir Kozlov 
>>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>> ?CCing to GC group.
>>>
>>> Looks fine to me but someone from GC land have to look too.
>>>
>>> I wish we have more concrete indication for off-heap access instead 
>>> of guessing it based on how we address memory through Unsafe API.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>> On 11/29/19 7:42 AM, Vladimir Ivanov wrote:
>>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8226411
>>>> There were a number of fixes in C2 support for unsafe accesses 
>>>> recently which led to additional memory barriers around them. It 
>>>> improved stability, but in some cases it was redundant. One of 
>>>> important use cases which regressed is off-heap accesses [1]. The 
>>>> barriers around them are redundant because they are serialized on 
>>>> raw memory and don't intersect with any on-heap accesses.
>>>> Proposed fix skips memory barriers around unsafe accesses which are 
>>>> provably off-heap (base == NULL).
>>>> It (almost completely) recovers performance on the microbenchmark 
>>>> provided in JDK-8224182 [1].
>>>> Testing: tier1-6.
>>>> Best regards,
>>>> Vladimir Ivanov
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182
>>


From vladimir.x.ivanov at oracle.com  Mon Dec  9 16:04:01 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 9 Dec 2019 19:04:01 +0300
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com>
References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com>
 <BC1A1F43-8B28-4635-899A-3936251746EB@oracle.com>
 <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com>
 <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com>
Message-ID: <784f3959-933a-a5c7-beb2-f011c9928ec7@oracle.com>

Thanks, Erik!

Best regards,
Vladimir Ivanov

On 09.12.2019 18:59, Erik ?sterlund wrote:
> Hi Vladimir,
> 
> On 2019-12-06 16:19, Vladimir Ivanov wrote:
>> Hi Erik,
>>
>> I like your idea. Here's updated version:
>> ? http://cr.openjdk.java.net/~vlivanov/8226411/webrev.01
> 
> Looks good!
> 
>> While browsing the code, I noticed that changes in 
>> G1BarrierSetC2::load_at_resolved() aren't required (need_cpu_mem_bar 
>> is used for oop case). But I decided to keep them to keep it 
>> (relatively) close to C2Access::needs_cpu_membar().
> 
> Sounds reasonable.
> 
> Thanks,
> /Erik
> 
>> Best regards,
>> Vladimir Ivanov
>>
>> On 05.12.2019 23:14, Erik ?sterlund wrote:
>>> Hi,
>>>
>>> Could we use the existing IN_NATIVE decorator instead of introducing 
>>> a new decorator that seems to be an alias for the same thing? The 
>>> decorator describing its use (IN_NATIVE) says it is for off-heap 
>>> accesses pointing into the heap. We can just remove from the comment 
>>> the part presuming it is a reference.
>>>
>>> What do you think?
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> On 5 Dec 2019, at 20:16, Vladimir Kozlov 
>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>> ?CCing to GC group.
>>>>
>>>> Looks fine to me but someone from GC land have to look too.
>>>>
>>>> I wish we have more concrete indication for off-heap access instead 
>>>> of guessing it based on how we address memory through Unsafe API.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>> On 11/29/19 7:42 AM, Vladimir Ivanov wrote:
>>>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8226411
>>>>> There were a number of fixes in C2 support for unsafe accesses 
>>>>> recently which led to additional memory barriers around them. It 
>>>>> improved stability, but in some cases it was redundant. One of 
>>>>> important use cases which regressed is off-heap accesses [1]. The 
>>>>> barriers around them are redundant because they are serialized on 
>>>>> raw memory and don't intersect with any on-heap accesses.
>>>>> Proposed fix skips memory barriers around unsafe accesses which are 
>>>>> provably off-heap (base == NULL).
>>>>> It (almost completely) recovers performance on the microbenchmark 
>>>>> provided in JDK-8224182 [1].
>>>>> Testing: tier1-6.
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182
>>>
> 

From vladimir.x.ivanov at oracle.com  Mon Dec  9 16:06:57 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 9 Dec 2019 19:06:57 +0300
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <86F63579-C1DB-469D-8E8F-E8017C28D342@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
 <86F63579-C1DB-469D-8E8F-E8017C28D342@oracle.com>
Message-ID: <0f033409-6fd8-a60b-3eac-850a0040c91e@oracle.com>

Thanks for review, John.

> I wonder if there is a similar opportunity in 
> LibraryCallKit::inline_unsafe_load_store
> or inline_vector_mem_operation or?inline_unsafe_copyMemory. ?All of those
> also form unsafe addresses, and at least some seem to mention NULL_PTR 
> or IN_HEAP.
> Is it worth an explanatory comment or a tracking bug?

Good point. I think inline_vector_mem_operation and 
inline_unsafe_copyMemory can benefit from a similar optimization. Will 
file an RFE.

Best regards,
Vladimir Ivanov

From tobias.hartmann at oracle.com  Mon Dec  9 16:25:22 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 9 Dec 2019 17:25:22 +0100
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com>
References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com>
 <BC1A1F43-8B28-4635-899A-3936251746EB@oracle.com>
 <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com>
 <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com>
Message-ID: <e20b7c15-e1b5-eaa6-9194-303bc45f83ab@oracle.com>


On 09.12.19 16:59, Erik ?sterlund wrote:
>> I like your idea. Here's updated version:
>> ? http://cr.openjdk.java.net/~vlivanov/8226411/webrev.01
> 
> Looks good!

+1

Best regards,
Tobias

From vladimir.x.ivanov at oracle.com  Mon Dec  9 16:26:01 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 9 Dec 2019 19:26:01 +0300
Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in
 Thread::currentThread() intrinsic
In-Reply-To: <A576A719-4E5F-4479-A2B0-475A36B1F36F@oracle.com>
References: <d66934e4-5a8b-ac14-ddca-1d557846d4ee@oracle.com>
 <A576A719-4E5F-4479-A2B0-475A36B1F36F@oracle.com>
Message-ID: <1c2ed832-2915-5484-0620-5838d8377ec4@oracle.com>

Thanks for review, John.

> It makes me wonder what?s wrong with the various overloadings
> of GraphKit::make_load, that you have to open-code a special call.
> If this happens a lot, we should try to figure out a new way to call
> GK::make_load.  I would have thought there would be an adr_idx
> value for C->immutable_memory!
> 
> Feeding from immutable_memory will probably be a good thing to
> do in more circumstances as we get more reliably immutable data.

Yes, it looks appealing, but there are some cases which need special 
handling. For example, it doesn't respect instance initialization which 
can manifest as observing partially constructed instance.

Best regards,
Vladimir Ivanov

>> On Dec 5, 2019, at 4:43 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8235143
>>
>> Thread::currentThread() intrinsic doesn't need memory state:
>> though multiple threads can execute same code, "current thread" can't change in the context of a single method activation. So, once it is observed, it's safe to share among all users.
>>
>> One of the use cases which benefit a lot from such optimization is ownership checks for thread confined resources (fast path check for owner thread to avoid heavy-weight synchronization).
>>
>> The patch was part of foreign-memaccess branch in Project Panama and showed good performance results on Memory Access API implementation [1].
>>
>> Testing: tier1-4
>>
>> PS: the optimization should be disabled in Project Loom: the assumption doesn't hold for continuations (in their current form).
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://openjdk.java.net/jeps/370
>>     JEP 370: Foreign-Memory Access API (Incubator)
> 

From sandhya.viswanathan at intel.com  Mon Dec  9 18:54:32 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Mon, 9 Dec 2019 18:54:32 +0000
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
Message-ID: <BYAPR11MB298388BDB28EAA9B75B2E2D3EF580@BYAPR11MB2983.namprd11.prod.outlook.com>

Hi Vladimir,

It will be wonderful if we can get this checked in by Thursday deadline for JDK 14.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Thursday, December 05, 2019 4:01 AM
To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: Bhateja, Jatin <jatin.bhateja at intel.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations

http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all
https://bugs.openjdk.java.net/browse/JDK-8235405

Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following
operations:
   - LoadVector
   - StoreVector
   - RoundDoubleModeV
   - AndV
   - OrV
   - XorV
   - MulAddVS2VI
   - PopCountVI

Indiviual patches:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual

As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.

The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad  will be posted for review separately (7 more patches pending).

Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Best regards,
Vladimir Ivanov

[1]
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html

From tom.rodriguez at oracle.com  Mon Dec  9 20:10:44 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 9 Dec 2019 12:10:44 -0800
Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for large
 code caches
Message-ID: <c2439f20-597c-8a99-3c9f-fa08d6d9d2af@oracle.com>

http://cr.openjdk.java.net/~never/8229377/webrev
https://bugs.openjdk.java.net/browse/JDK-8229377

This is a minor improvement to the JVMCI invalidate method to avoid 
scanning large code caches when invalidating a single nmethod.  Instead 
the nmethod is directly made not_entrant.  In general I'm unclear what 
the benefit of the 
mark_for_deoptimization/make_marked_nmethods_not_entrant split is. 
Testing is in progress.

JDK-8230884 had been previously duplicated against this because they 
overlapped a bit, but in the interest of clarity I separated them again.

tom

From headius at headius.com  Mon Dec  9 21:31:00 2019
From: headius at headius.com (Charles Oliver Nutter)
Date: Mon, 9 Dec 2019 15:31:00 -0600
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
Message-ID: <CAE-f1xQE8nEfXcVcdcNf33npiWVC4Vo4t0OepVVS4XwKUdV3kg@mail.gmail.com>

This gets a big +1 for me, but honestly even better would be eliminating
the hard inline level limit altogether in favor of other metrics (inlined
code size, optimized node count, incremental inlining). Ignoring that for
the moment...

Anecdotally, I can say that there are many code paths in JRuby that don't
inline fully *solely* because of this limit. For example, there are objects
in the numeric tower that out of compatibility necessity have constructor
paths that get dangerously close to 9 levels deep:

calling code-> Fixnum factory -> Fixnum constructor -> Integer -> Numeric
-> Object -> BasicObject -> j.l.Object

...and that would just inline into the first level of Ruby code. The last
five levels here are mostly just "super()" calls. Ideally we would want to
have a few levels of Ruby code inlining together too.

If this path doesn't inline, we've got no chance for escape analysis and
other optimizations to reduce the transient numeric objects.

A question relating to JRuby: how does the inline level interact with
MethodHandle/LambdaForm @ForceInline? Does that get around it? ALWAYS?
Clearly we see most of our invokedynamic call sites inline, but I have
little understanding of the actual cost of the five to 10 to 20 levels of
LFs between an indy site and the eventual target.

In any case... +1, inline all the things.

- Charlie

On Sun, Dec 8, 2019 at 3:42 PM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi,
>
> increasing MaxInlineLevel can substantially improve performance in some
> benchmarks[1], and has been reported to help applications implemented in
> scala in particular.
>
> There is always some risk of regressions when tweaking the default
> inlining settings. I've done a number of experiments to ascertain that
> the effect of increasing this on a wide array of benchmarks. With 15 all
> benchmarks tested are show either neutral or positive results, with no
> observed regression w.r.t. compilation speed or code cache usage.
>
> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
>
> Thanks!
>
> /Claes
>
> [1] One http://renaissance.dev sub-benchmark improve by almost 3x with
> an increase from 9 to 15.
>

From gromero at linux.vnet.ibm.com  Mon Dec  9 22:05:04 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 9 Dec 2019 19:05:04 -0300
Subject: RFR(XS): 8223968: Add abort type description to RTM statistic
 counters
In-Reply-To: <AM0PR0202MB3297C7F8E5CEBAE755DF83589A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <d942d34d-4ef2-a30f-c688-a37ad624515a@linux.vnet.ibm.com>
 <AM0PR0202MB3297C7F8E5CEBAE755DF83589A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com>

Hi Martin,

On 12/09/2019 11:06 AM, Doerr, Martin wrote:
> nice improvement!

=)

Thanks for the quick review!


> I'd only make the message array static:
> static const char* _abortX_desc[ABORT_STATUS_LIMIT];
> 
> +const char* RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = {
> +  "abort instruction   ",
> +  "may succeed on retry",
> +  "thread conflict     ",
> +  "buffer overflow     ",
> +  "debug or trap hit   ",
> +  "maximum nested depth"
> +};

oh... sure. Done.


> Please update the copyright in the test.

Done.


v2:

http://cr.openjdk.java.net/~gromero/8223968/v2/


Best regards,
Gustavo

From vladimir.x.ivanov at oracle.com  Mon Dec  9 22:11:23 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 10 Dec 2019 01:11:23 +0300
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <BYAPR11MB298388BDB28EAA9B75B2E2D3EF580@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
 <BYAPR11MB298388BDB28EAA9B75B2E2D3EF580@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <3c4dc007-485d-bf8b-d17a-8e63b9efefd7@oracle.com>

Hi Sandhya,

We need 1 more (R)eview before I can push it.

PS: and 8234392 [1] which it depends on.

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-December/036291.html

On 09.12.2019 21:54, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> It will be wonderful if we can get this checked in by Thursday deadline for JDK 14.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Thursday, December 05, 2019 4:01 AM
> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: Bhateja, Jatin <jatin.bhateja at intel.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all
> https://bugs.openjdk.java.net/browse/JDK-8235405
> 
> Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following
> operations:
>     - LoadVector
>     - StoreVector
>     - RoundDoubleModeV
>     - AndV
>     - OrV
>     - XorV
>     - MulAddVS2VI
>     - PopCountVI
> 
> Indiviual patches:
>   
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual
> 
> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
> 
> The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad  will be posted for review separately (7 more patches pending).
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1]
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
> 

From claes.redestad at oracle.com  Mon Dec  9 22:33:52 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 9 Dec 2019 23:33:52 +0100
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <CAE-f1xQE8nEfXcVcdcNf33npiWVC4Vo4t0OepVVS4XwKUdV3kg@mail.gmail.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAE-f1xQE8nEfXcVcdcNf33npiWVC4Vo4t0OepVVS4XwKUdV3kg@mail.gmail.com>
Message-ID: <fc748aab-4fed-92f9-e618-0c69c1cec50c@oracle.com>

Thanks for chiming in!

On 2019-12-09 22:31, Charles Oliver Nutter wrote:
> A question relating to JRuby: how does the inline level interact with 
> MethodHandle/LambdaForm?@ForceInline? Does that get around it? ALWAYS? 
> Clearly we see most of our invokedynamic call sites inline, but I have 
> little understanding of the actual cost of the five to 10 to 20 levels 
> of LFs between an indy site and the eventual target.

 From my reading of the code, there's a limit of 100 levels of
  @ForceInline. This seems very hard to ever hit, and would be observable
if -XX:+PrintInlining ever printed "MaxForceInlineLevel" as a reason to
not inline.

AFAIU, and my understanding here is far from perfect, then long before
hitting this limit even extreme LF/indy usage will have split deeply
nested calls into smaller chunks that we chain together, in an effort to
keep very complex shapes byte sized for the JIT and better enable some
LF sharing optimizations on the library side.

/Claes

From sandhya.viswanathan at intel.com  Mon Dec  9 22:32:52 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Mon, 9 Dec 2019 22:32:52 +0000
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <3c4dc007-485d-bf8b-d17a-8e63b9efefd7@oracle.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
 <BYAPR11MB298388BDB28EAA9B75B2E2D3EF580@BYAPR11MB2983.namprd11.prod.outlook.com>
 <3c4dc007-485d-bf8b-d17a-8e63b9efefd7@oracle.com>
Message-ID: <BYAPR11MB29832C2A6D48A966F7EB196AEF580@BYAPR11MB2983.namprd11.prod.outlook.com>

Could we please get one more review on this? 
This is part of the Generic operand and libjvm size reduction. 
This work was split into five patches of which three are checked in.
The 8235405 and 8234392 are the remaining two.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Monday, December 09, 2019 2:11 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: Bhateja, Jatin <jatin.bhateja at intel.com>
Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations

Hi Sandhya,

We need 1 more (R)eview before I can push it.

PS: and 8234392 [1] which it depends on.

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-December/036291.html

On 09.12.2019 21:54, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> It will be wonderful if we can get this checked in by Thursday deadline for JDK 14.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Thursday, December 05, 2019 4:01 AM
> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: Bhateja, Jatin <jatin.bhateja at intel.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all
> https://bugs.openjdk.java.net/browse/JDK-8235405
> 
> Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following
> operations:
>     - LoadVector
>     - StoreVector
>     - RoundDoubleModeV
>     - AndV
>     - OrV
>     - XorV
>     - MulAddVS2VI
>     - PopCountVI
> 
> Indiviual patches:
>   
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual
> 
> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
> 
> The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad  will be posted for review separately (7 more patches pending).
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1]
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
> 

From vladimir.x.ivanov at oracle.com  Mon Dec  9 22:44:31 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 10 Dec 2019 01:44:31 +0300
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <BYAPR11MB29832C2A6D48A966F7EB196AEF580@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
 <BYAPR11MB298388BDB28EAA9B75B2E2D3EF580@BYAPR11MB2983.namprd11.prod.outlook.com>
 <3c4dc007-485d-bf8b-d17a-8e63b9efefd7@oracle.com>
 <BYAPR11MB29832C2A6D48A966F7EB196AEF580@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <d67a0363-d7a9-7b53-d97e-7ef6f9d17dcd@oracle.com>


> Could we please get one more review on this?
> This is part of the Generic operand and libjvm size reduction.
> This work was split into five patches of which three are checked in.
> The 8235405 and 8234392 are the remaining two.

To clarify: as I mentioned in the original email, it's the first batch 
of merged instructions:

 >> The patch is derived from the initial version of generic vector 
support [1]. Generic vector support was reviewed earlier and the other 
parts of refactorings in x86.ad  will be posted for review separately (7 
more patches pending).

(I delayed posting the rest for review partly to avoid spamming the 
alias and to be able to promptly address review feedback.)

Considering the size of those patches (~6k LOCs in total), I doubt all 7 
can be fully reviewed before Thursday. But IMO it's fine to get as much 
as we can into 14 and put the rest into 15.

Best regards,
Vladimir Ivanov

> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Monday, December 09, 2019 2:11 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: Bhateja, Jatin <jatin.bhateja at intel.com>
> Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations
> 
> Hi Sandhya,
> 
> We need 1 more (R)eview before I can push it.
> 
> PS: and 8234392 [1] which it depends on.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1]
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-December/036291.html
> 
> On 09.12.2019 21:54, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> It will be wonderful if we can get this checked in by Thursday deadline for JDK 14.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Thursday, December 05, 2019 4:01 AM
>> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Cc: Bhateja, Jatin <jatin.bhateja at intel.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
>> Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all
>> https://bugs.openjdk.java.net/browse/JDK-8235405
>>
>> Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following
>> operations:
>>      - LoadVector
>>      - StoreVector
>>      - RoundDoubleModeV
>>      - AndV
>>      - OrV
>>      - XorV
>>      - MulAddVS2VI
>>      - PopCountVI
>>
>> Indiviual patches:
>>    
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual
>>
>> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
>>
>> The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad  will be posted for review separately (7 more patches pending).
>>
>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1]
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
>>

From vladimir.kozlov at oracle.com  Mon Dec  9 23:28:33 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 9 Dec 2019 15:28:33 -0800
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
Message-ID: <d63fc995-9e3a-a136-7c6c-1801e948e492@oracle.com>

This looks good.

Thanks,
Vladimir

On 12/5/19 1:53 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8234392/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234392
> 
> Make basic type available in Matcher::match_rule_supported_vector() and refactor Matcher::match_rule_supported_vector() 
> on x86.
> 
> It enables significant simplification of Matcher::match_rule_supported_vector()on x86 by using 
> Matcher::vector_size_supported() to cover cases like AXV2 is required for 256-bit operaions on integral (see 
> Matcher::vector_width_in_bytes() [1] for details).
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Testing: tier1-4
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1442

From vladimir.kozlov at oracle.com  Tue Dec 10 00:44:28 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 9 Dec 2019 16:44:28 -0800
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
Message-ID: <0450f922-d21f-07bb-b122-988c4cec29a7@oracle.com>

On 12/5/19 4:01 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all
> https://bugs.openjdk.java.net/browse/JDK-8235405
> 
> Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers 
> the following operations:
>  ? - LoadVector
>  ? - StoreVector

There was difference in usage of vmovdquq vs vmovdqul instructions for 64 bytes wide vectors depending on element size 
(or number of elements in other word). The only difference in encoding of 2 instructions is value of *vex_w* attribute.
Proposed change use only vmovdqul. I want to know rational behind the change.

>  ? - RoundDoubleModeV

Conversion of predicate to assert vround8D_* instructions needs to be explained.
Originally they were guarded by UseAVX > 2. The code in match_rule_supported_vector() checks only AVX and not AVX512:

http://hg.openjdk.java.net/jdk/jdk/file/2aaa8bcb90a9/src/hotspot/cpu/x86/x86.ad#l1411

Is it because 8234392 added vector_size_supported() check?

>  ? - AndV
>  ? - OrV
>  ? - XorV

Above 3 are good.

>  ? - MulAddVS2VI

Both related webrevs are good.

>  ? - PopCountVI

Good.

Thanks,
Vladimir

> 
> Indiviual patches:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual
> 
> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for 
> reduction/merging left, but they are deliberately left out for future work.
> 
> The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed 
> earlier and the other parts of refactorings in x86.ad? will be posted for review separately (7 more patches pending).
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html

From vladimir.kozlov at oracle.com  Tue Dec 10 00:52:50 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 9 Dec 2019 16:52:50 -0800
Subject: RFR(XS): 8223968: Add abort type description to RTM statistic
 counters
In-Reply-To: <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com>
References: <d942d34d-4ef2-a30f-c688-a37ad624515a@linux.vnet.ibm.com>
 <AM0PR0202MB3297C7F8E5CEBAE755DF83589A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com>
Message-ID: <3aefe4b7-1be7-64cd-f27c-1db9e9bb29b7@oracle.com>

v2 looks good to me.

Thanks,
Vladimir

On 12/9/19 2:05 PM, Gustavo Romero wrote:
> Hi Martin,
> 
> On 12/09/2019 11:06 AM, Doerr, Martin wrote:
>> nice improvement!
> 
> =)
> 
> Thanks for the quick review!
> 
> 
>> I'd only make the message array static:
>> static const char* _abortX_desc[ABORT_STATUS_LIMIT];
>>
>> +const char* RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = {
>> +? "abort instruction?? ",
>> +? "may succeed on retry",
>> +? "thread conflict???? ",
>> +? "buffer overflow???? ",
>> +? "debug or trap hit?? ",
>> +? "maximum nested depth"
>> +};
> 
> oh... sure. Done.
> 
> 
>> Please update the copyright in the test.
> 
> Done.
> 
> 
> v2:
> 
> http://cr.openjdk.java.net/~gromero/8223968/v2/
> 
> 
> Best regards,
> Gustavo

From vladimir.kozlov at oracle.com  Tue Dec 10 01:01:34 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 9 Dec 2019 17:01:34 -0800
Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks
 -XX:-EnableJVMCI
Message-ID: <dee69037-5094-5061-164c-0bee425d18ae@oracle.com>

https://cr.openjdk.java.net/~kvn/8235539/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8235539

Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag is used (added by 
JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true.

Changes are prepared by Doug for GraalVM and adapted by me for JDK 14.

Passed tier1 where new test runs.

Thanks,
Vladimir

From sandhya.viswanathan at intel.com  Tue Dec 10 01:13:05 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 10 Dec 2019 01:13:05 +0000
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <0450f922-d21f-07bb-b122-988c4cec29a7@oracle.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
 <0450f922-d21f-07bb-b122-988c4cec29a7@oracle.com>
Message-ID: <BYAPR11MB29838B615C5FB4CCBE0F27E9EF5B0@BYAPR11MB2983.namprd11.prod.outlook.com>

Hi Vladimir,

Thanks a lot for your review. Please see my response in your email below.

Best Regards,
Sandhya
 

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Monday, December 09, 2019 4:44 PM
To: hotspot-compiler-dev at openjdk.java.net
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations

On 12/5/19 4:01 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all
> https://bugs.openjdk.java.net/browse/JDK-8235405
> 
> Reduce the number of AD instructions needed to implement vector 
> operations by merging existing ones. The patch covers the following operations:
>  ? - LoadVector
>  ? - StoreVector

There was difference in usage of vmovdquq vs vmovdqul instructions for 64 bytes wide vectors depending on element size (or number of elements in other word). The only difference in encoding of 2 instructions is value of *vex_w* attribute.
Proposed change use only vmovdqul. I want to know rational behind the change.

Sandhya >> The AVX512 emovdqul and emovdquq instruction behavior is same, when the instruction is encoded with no mask register, which is the case here. 

>  ? - RoundDoubleModeV

Conversion of predicate to assert vround8D_* instructions needs to be explained.
Originally they were guarded by UseAVX > 2. The code in match_rule_supported_vector() checks only AVX and not AVX512:

http://hg.openjdk.java.net/jdk/jdk/file/2aaa8bcb90a9/src/hotspot/cpu/x86/x86.ad#l1411

Is it because 8234392 added vector_size_supported() check?

Sandhya >> Yes.

>  ? - AndV
>  ? - OrV
>  ? - XorV

Above 3 are good.

>  ? - MulAddVS2VI

Both related webrevs are good.

>  ? - PopCountVI

Good.

Thanks,
Vladimir

> 
> Indiviual patches:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/indivi
> dual
> 
> As Jatin described, merging is applied only to AD instructions of 
> similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
> 
> The patch is derived from the inintial version of generic vector 
> support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad? will be posted for review separately (7 more patches pending).
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-Augu
> st/034822.html

From john.r.rose at oracle.com  Tue Dec 10 01:27:17 2019
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 9 Dec 2019 17:27:17 -0800
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
Message-ID: <BEE49F54-A83F-4DF1-B31C-4C7DEBCED013@oracle.com>

Reviewed by me also.  This is a better way to do vector stuff.

Thanks, Vladimir K, for the perceptive questions.

? John

From john.r.rose at oracle.com  Tue Dec 10 01:53:14 2019
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 9 Dec 2019 17:53:14 -0800
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
Message-ID: <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>

Reviewed.

The rules for NegV[DF] and CMoveV[DF] that remain looks
puzzling to me, and would benefit from a brief comment for
future readers, explaining why they are there.

The NegV rules seem to be removable for the same reason
(size limits) that the other rules are removable.  If there?s a
tricky reason they must be called out explicitly, it would be
good to explain.  The CMoveV rules probably stem from limited
support for CMove, but again a comment would be good.

? John

On Dec 5, 2019, at 1:53 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8234392/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234392
> 
> Make basic type available in Matcher::match_rule_supported_vector() and refactor Matcher::match_rule_supported_vector() on x86.
> 
> It enables significant simplification of Matcher::match_rule_supported_vector()on x86 by using Matcher::vector_size_supported() to cover cases like AXV2 is required for 256-bit operaions on integral (see Matcher::vector_width_in_bytes() [1] for details).
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Testing: tier1-4
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1442


From vladimir.kozlov at oracle.com  Tue Dec 10 02:37:05 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 9 Dec 2019 18:37:05 -0800
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <BYAPR11MB29838B615C5FB4CCBE0F27E9EF5B0@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
 <0450f922-d21f-07bb-b122-988c4cec29a7@oracle.com>
 <BYAPR11MB29838B615C5FB4CCBE0F27E9EF5B0@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <98701f33-67d9-6bac-765f-9975556a82bd@oracle.com>

 > Sandhya >> The AVX512 emovdqul and emovdquq instruction behavior is same

Okay. No more comments. Reviewed.

Thanks,
Vladimir

On 12/9/19 5:13 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Thanks a lot for your review. Please see my response in your email below.
> 
> Best Regards,
> Sandhya
>   
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Monday, December 09, 2019 4:44 PM
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations
> 
> On 12/5/19 4:01 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all
>> https://bugs.openjdk.java.net/browse/JDK-8235405
>>
>> Reduce the number of AD instructions needed to implement vector
>> operations by merging existing ones. The patch covers the following operations:
>>   ? - LoadVector
>>   ? - StoreVector
> 
> There was difference in usage of vmovdquq vs vmovdqul instructions for 64 bytes wide vectors depending on element size (or number of elements in other word). The only difference in encoding of 2 instructions is value of *vex_w* attribute.
> Proposed change use only vmovdqul. I want to know rational behind the change.
> 
> Sandhya >> The AVX512 emovdqul and emovdquq instruction behavior is same, when the instruction is encoded with no mask register, which is the case here.
> 
>>   ? - RoundDoubleModeV
> 
> Conversion of predicate to assert vround8D_* instructions needs to be explained.
> Originally they were guarded by UseAVX > 2. The code in match_rule_supported_vector() checks only AVX and not AVX512:
> 
> http://hg.openjdk.java.net/jdk/jdk/file/2aaa8bcb90a9/src/hotspot/cpu/x86/x86.ad#l1411
> 
> Is it because 8234392 added vector_size_supported() check?
> 
> Sandhya >> Yes.
> 
>>   ? - AndV
>>   ? - OrV
>>   ? - XorV
> 
> Above 3 are good.
> 
>>   ? - MulAddVS2VI
> 
> Both related webrevs are good.
> 
>>   ? - PopCountVI
> 
> Good.
> 
> Thanks,
> Vladimir
> 
>>
>> Indiviual patches:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/indivi
>> dual
>>
>> As Jatin described, merging is applied only to AD instructions of
>> similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
>>
>> The patch is derived from the inintial version of generic vector
>> support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad? will be posted for review separately (7 more patches pending).
>>
>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1]
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-Augu
>> st/034822.html

From vladimir.kozlov at oracle.com  Tue Dec 10 04:09:21 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 9 Dec 2019 20:09:21 -0800
Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for
 large code caches
In-Reply-To: <c2439f20-597c-8a99-3c9f-fa08d6d9d2af@oracle.com>
References: <c2439f20-597c-8a99-3c9f-fa08d6d9d2af@oracle.com>
Message-ID: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com>

But it looks like there are testing failures.

The idea and changes seems fine to me. The only question I have is why JVMCI code want to deoptimize all related 
compiler frames immediately? Why marking is not enough?

Thanks,
Vladimir

On 12/9/19 12:10 PM, Tom Rodriguez wrote:
> http://cr.openjdk.java.net/~never/8229377/webrev
> https://bugs.openjdk.java.net/browse/JDK-8229377
> 
> This is a minor improvement to the JVMCI invalidate method to avoid scanning large code caches when invalidating a 
> single nmethod.? Instead the nmethod is directly made not_entrant.? In general I'm unclear what the benefit of the 
> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. Testing is in progress.
> 
> JDK-8230884 had been previously duplicated against this because they overlapped a bit, but in the interest of clarity I 
> separated them again.
> 
> tom

From vladimir.kozlov at oracle.com  Tue Dec 10 04:20:42 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 9 Dec 2019 20:20:42 -0800
Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be
In-Reply-To: <87k175a5sg.fsf@redhat.com>
References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com>
 <87k175a5sg.fsf@redhat.com>
Message-ID: <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com>

+1

Thanks,
Vladimir K

On 12/9/19 7:15 AM, Roland Westrelin wrote:
> 
>> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/
> 
> That looks reasonable to me.
> 
> Roland.
> 

From jzaugg at gmail.com  Tue Dec 10 06:18:52 2019
From: jzaugg at gmail.com (Jason Zaugg)
Date: Tue, 10 Dec 2019 16:18:52 +1000
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
Message-ID: <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>

On Mon, 9 Dec 2019 at 07:42, Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi,
>
> increasing MaxInlineLevel can substantially improve performance in some
> benchmarks[1], and has been reported to help applications implemented in
> scala in particular.
>
> There is always some risk of regressions when tweaking the default
> inlining settings. I've done a number of experiments to ascertain that
> the effect of increasing this on a wide array of benchmarks. With 15 all
> benchmarks tested are show either neutral or positive results, with no
> observed regression w.r.t. compilation speed or code cache usage.
>

Great to see!

A few notes from the Scala perspective (I work on the Scala compiler and
library at Lightbend.)

Long chains of method calls arise in Scala applications for a few reasons:

   - Standard library collections have generic bridge methods that arise
   from refinement of the collection type constructor down the hierarchy. For
   instance, a generic method returning `Coll[T]` can be overridden in a
   subclass to return `MyCollection[...]`.
   - The encoding of traits (interfaces with mixin behaviour and storage)
   and singleton objects indirect calls through forwarder methods (these don't
   adapt arguments/results)
   - The collections hierarchy itself is relatively deep, so the chain of
   super() calls can be problematic (as per Charle's message about JRuby).
   - Symbolic method names in the collections API have alphanumeric aliases
   (e.g. `+=` and `addOne`)

These factors tend to conspire to exhaust the default budget. Analysis with
JITWatch / JMH etc have led us to refactor the library code to keep
important use cases fast (example [1]).

I once experimented [2] with a change to Hotspot to consider our
bridge-like forwarder methods as exempt from the MaxInlineLevel budget, as
is the case for lambda form frames. That change alone delivered most of the
performance benefit of doubling MaxInlneLevel with an unmodified JVM.

Jason Zaugg

[1] https://github.com/scala/bug/issues/11627#issuecomment-514490505
[2] https://gist.github.com/retronym/d27090a68570485c3e329a52db0c7b25

From christian.hagedorn at oracle.com  Tue Dec 10 06:57:34 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 10 Dec 2019 07:57:34 +0100
Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be
In-Reply-To: <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com>
References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com>
 <87k175a5sg.fsf@redhat.com> <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com>
Message-ID: <d4839f2d-76db-a5b5-8a19-6e4de931a357@oracle.com>

Thank you Roland and Vladimir for your reviews!

Best regards,
Christian

On 10.12.19 05:20, Vladimir Kozlov wrote:
> +1
> 
> Thanks,
> Vladimir K
> 
> On 12/9/19 7:15 AM, Roland Westrelin wrote:
>>
>>> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/
>>
>> That looks reasonable to me.
>>
>> Roland.
>>

From goetz.lindenmaier at sap.com  Tue Dec 10 07:39:33 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 10 Dec 2019 07:39:33 +0000
Subject: Please open 8152988: [AOT] Update test batch definitions to include
 aot-ed java.base module mode into hs-comp testing
In-Reply-To: <AM6PR02MB53479EB3A6378F4C1EFC9FADEC580@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB53479EB3A6378F4C1EFC9FADEC580@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <AM6PR02MB5347B17B2FD1586BB7356918EC5B0@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi,

I would like to downport this change.
It would be great if the bug could be opened up.

http://hg.openjdk.java.net/jdk/jdk/rev/ff10f8f3a583
https://bugs.openjdk.java.net/browse/JDK-8152988

Thanks and best regards,
  Goetz.


From Pengfei.Li at arm.com  Tue Dec 10 08:28:57 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Tue, 10 Dec 2019 08:28:57 +0000
Subject: [aarch64-port-dev ]  crash due to long offset
In-Reply-To: <2d65a25f-8e38-4da4-a6ef-b4ff9ba847fe.zhuoren.wz@alibaba-inc.com>
References: <2d65a25f-8e38-4da4-a6ef-b4ff9ba847fe.zhuoren.wz@alibaba-inc.com>
Message-ID: <DB7PR08MB3115E3198230B3F96C9EAB9A965B0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Zhuoren,

> I also wrote a patch to solve this issue, please also review.
> http://cr.openjdk.java.net/~wzhuo/BigOffsetAarch64/webrev.00/jdk13u.pat
> ch

Thanks for your patch. I (NOT a reviewer) eyeballed your fix and found a probable mistake.

In  "enc_class aarch64_enc_str(iRegL src, memory mem) %{ ... %}", you have "if (($mem$$index == -1) && ($mem$$disp > 0)& (($mem$$disp & 0x7) != 0) && ($mem$$disp > 255))".
Should it be "&&" instead of "&" in the middle?

Another question: Is it possible to add the logic into loadStore() or another new function instead of duplicating it everywhere in aarch64.ad?

I've also CC'ed this to hotspot-compiler-dev because all hotspot compiler patches (including AArch64 specific) should go through it for review.

--
Thanks,
Pengfei


From tobias.hartmann at oracle.com  Tue Dec 10 08:30:48 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 10 Dec 2019 09:30:48 +0100
Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be
In-Reply-To: <d4839f2d-76db-a5b5-8a19-6e4de931a357@oracle.com>
References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com>
 <87k175a5sg.fsf@redhat.com> <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com>
 <d4839f2d-76db-a5b5-8a19-6e4de931a357@oracle.com>
Message-ID: <d0c635ce-4707-332c-53ee-cbdb9e6e9c87@oracle.com>

Looks good to me too. Pushed.

Best regards,
Tobias

On 10.12.19 07:57, Christian Hagedorn wrote:
> Thank you Roland and Vladimir for your reviews!
> 
> Best regards,
> Christian
> 
> On 10.12.19 05:20, Vladimir Kozlov wrote:
>> +1
>>
>> Thanks,
>> Vladimir K
>>
>> On 12/9/19 7:15 AM, Roland Westrelin wrote:
>>>
>>>> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/
>>>
>>> That looks reasonable to me.
>>>
>>> Roland.
>>>

From christian.hagedorn at oracle.com  Tue Dec 10 08:39:15 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 10 Dec 2019 09:39:15 +0100
Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be
In-Reply-To: <d0c635ce-4707-332c-53ee-cbdb9e6e9c87@oracle.com>
References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com>
 <87k175a5sg.fsf@redhat.com> <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com>
 <d4839f2d-76db-a5b5-8a19-6e4de931a357@oracle.com>
 <d0c635ce-4707-332c-53ee-cbdb9e6e9c87@oracle.com>
Message-ID: <82d807cb-e192-1ff2-84af-d2ef31278c24@oracle.com>

Thank you Tobias for your review!

Best regards,
Christian

On 10.12.19 09:30, Tobias Hartmann wrote:
> Looks good to me too. Pushed.
> 
> Best regards,
> Tobias
> 
> On 10.12.19 07:57, Christian Hagedorn wrote:
>> Thank you Roland and Vladimir for your reviews!
>>
>> Best regards,
>> Christian
>>
>> On 10.12.19 05:20, Vladimir Kozlov wrote:
>>> +1
>>>
>>> Thanks,
>>> Vladimir K
>>>
>>> On 12/9/19 7:15 AM, Roland Westrelin wrote:
>>>>
>>>>> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/
>>>>
>>>> That looks reasonable to me.
>>>>
>>>> Roland.
>>>>

From tobias.hartmann at oracle.com  Tue Dec 10 08:45:04 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 10 Dec 2019 09:45:04 +0100
Subject: Please open 8152988: [AOT] Update test batch definitions to
 include aot-ed java.base module mode into hs-comp testing
In-Reply-To: <AM6PR02MB5347B17B2FD1586BB7356918EC5B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB53479EB3A6378F4C1EFC9FADEC580@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <AM6PR02MB5347B17B2FD1586BB7356918EC5B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <75d120c2-19ea-a180-3680-b884195dccbd@oracle.com>

Hi Goetz,

done.

Best regards,
Tobias

On 10.12.19 08:39, Lindenmaier, Goetz wrote:
> Hi,
> 
> I would like to downport this change.
> It would be great if the bug could be opened up.
> 
> http://hg.openjdk.java.net/jdk/jdk/rev/ff10f8f3a583
> https://bugs.openjdk.java.net/browse/JDK-8152988
> 
> Thanks and best regards,
>   Goetz.
> 

From rwestrel at redhat.com  Tue Dec 10 08:45:07 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 10 Dec 2019 09:45:07 +0100
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <87wob5a8gc.fsf@redhat.com>
References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com>
 <VI1PR0202MB3312B7BD20D3FC5DAD7713949A420@VI1PR0202MB3312.eurprd02.prod.outlook.com>
 <87wob5a8gc.fsf@redhat.com>
Message-ID: <87eexca7sc.fsf@redhat.com>


New webrev with Martin's suggestion:

http://cr.openjdk.java.net/~roland/8234350/webrev.01/

Roland.


From tobias.hartmann at oracle.com  Tue Dec 10 08:53:16 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 10 Dec 2019 09:53:16 +0100
Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks
 -XX:-EnableJVMCI
In-Reply-To: <dee69037-5094-5061-164c-0bee425d18ae@oracle.com>
References: <dee69037-5094-5061-164c-0bee425d18ae@oracle.com>
Message-ID: <de92a511-ffa2-b282-d233-8141e27616d8@oracle.com>

Hi Vladimir,

looks good to me.

Please add a bug number to the test. Also, the Expectation.name/value/origin fields are not needed,
right? No new webrev required.

Best regards,
Tobias

On 10.12.19 02:01, Vladimir Kozlov wrote:
> https://cr.openjdk.java.net/~kvn/8235539/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8235539
> 
> Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag
> is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true.
> 
> Changes are prepared by Doug for GraalVM and adapted by me for JDK 14.
> 
> Passed tier1 where new test runs.
> 
> Thanks,
> Vladimir

From tom.rodriguez at oracle.com  Tue Dec 10 09:24:10 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 10 Dec 2019 01:24:10 -0800
Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for
 large code caches
In-Reply-To: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com>
References: <c2439f20-597c-8a99-3c9f-fa08d6d9d2af@oracle.com>
 <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com>
Message-ID: <c0521218-2367-f9f4-1098-dba6a5d0a9a4@oracle.com>


Vladimir Kozlov wrote on 12/9/19 8:09 PM:
> But it looks like there are testing failures.

None of the testing failures look like they have anything to do with 
these changes.  They all have failed recently in other builds from what 
I can see.

> 
> The idea and changes seems fine to me. The only question I have is why 
> JVMCI code want to deoptimize all related compiler frames immediately? 
> Why marking is not enough?

I'm not sure what you're asking.  mark_for_deoptimization doesn't really 
do anything on it's own.  You have to do the code cache scan to 
make_not_entrant and then do the stack walks to invalidate any frames 
which are still using that code.  I'm not changing the semantic of the 
original function either, just skipping a useless full code cache scan.

tom

> 
> Thanks,
> Vladimir
> 
> On 12/9/19 12:10 PM, Tom Rodriguez wrote:
>> http://cr.openjdk.java.net/~never/8229377/webrev
>> https://bugs.openjdk.java.net/browse/JDK-8229377
>>
>> This is a minor improvement to the JVMCI invalidate method to avoid 
>> scanning large code caches when invalidating a single nmethod.  
>> Instead the nmethod is directly made not_entrant.? In general I'm 
>> unclear what the benefit of the 
>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. 
>> Testing is in progress.
>>
>> JDK-8230884 had been previously duplicated against this because they 
>> overlapped a bit, but in the interest of clarity I separated them again.
>>
>> tom

From tobias.hartmann at oracle.com  Tue Dec 10 09:29:31 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 10 Dec 2019 10:29:31 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
Message-ID: <b75a104c-a8be-37a7-a26f-b07aef5dde6b@oracle.com>

Hi Christian,

as we've discussed offline, this looks good to me. Although the change is quite complex, I think we
should target JDK 14 for the reasons that Vladimir already brought up. Also, the entry->outcnt() > 1
condition makes sure this code is only executed in very specific/rare cases.

Nevertheless, it would be good to get more reviews.

Best regards,
Tobias

On 28.11.19 10:24, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8233033
> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/
> 
> The C2 compiled code produces a wrong result for 'iFld' in the test case. It is -8 instead of -7.
> The loop in the test case is partially peeled and then unswitched. The wrong result is produced
> because a wrong state is transferred to the interpreter when an uncommon trap is hit in the C2
> compiled code in the fast version of the unswitched loop.
> 
> The problem is when unswitching the loop, we clone the original loop predicates for the slow and
> fast version of the loop [1] but we do not account for partially peeled statements that are control
> dependent on the loop predicates (i.e. need to be executed after the predicates). As a result, these
> are executed before the cloned loop predicates.
> 
> The situation of the test case method PartialPeelingUnswitch::test() is visualized in [2]. IfTrue
> 118, the entry control of the original loop which follows right after the loop predicates, has an
> output edge to the StoreI 353 node. This node belongs to the "iFld += -7" statement which was
> partially peeled before. When creating the slow version of the loop and cloning the predicates in
> PhaseIdealLoop::create_slow_version_of_loop(), this control dependency is lost. StoreI 353 is still
> dependent on IfTrue 118 instead of IfTrue 472 (fast loop entry control) and IfTrue 476 (slow loop
> entry control). The original loop predicates are later removed and thus, when hitting the uncommon
> trap in the fast loop, we accidentally executed "iFld += -7" (StoreI 353) already even though the
> interpreter assumes C2 has not executed any statements in the loop. As a result, "iFld += -7" is
> executed twice in a row which produces a wrong result.
> 
> The fix is to replace the control input of all statements that have a control input from the
> original loop entry control (and are not the "loop selection" IfNode) with the fast and slow entry
> control, respectively. Since the statements cannot have two control inputs they need to be cloned
> together with all following nodes on a path to the loop phi nodes. The output of the last node
> before a loop phi on such a path needs to be adjusted to only point to the phi node belonging to the
> fast loop. The last node on the cloned path is set to the phi node belonging to the slow loop. The
> fix is visualized in [3]. The control input of StoreI 353 is now the entry control of the fast loop
> (IfTrue 472) and its output only points to the corresponding Phi 442 of the fast loop. The same was
> done for the cloned node StoreI 476 of StoreI 353 for the slow loop.
> 
> This bug can also be reproduced with JDK 11. Should we target this fix to 14 or defer it to 15
> (since it's a more complex one)?
> 
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/opto/loopUnswitch.cpp#l272
> [2] https://bugs.openjdk.java.net/secure/attachment/85593/wrong_dependencies.png
> [3] https://bugs.openjdk.java.net/secure/attachment/85592/fixed_dependencies.png

From tobias.hartmann at oracle.com  Tue Dec 10 09:34:43 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 10 Dec 2019 10:34:43 +0100
Subject: RFR(XXS): C1 compilation fails with
 -XX:+PrintIRDuringConstruction -XX:+Verbose
In-Reply-To: <CAHPdc0=1J6s0QUjkk0sYj+qGsyUcTfBHHyOU3w=9aWe=yO2hoA@mail.gmail.com>
References: <CAHPdc0m+t9yOZ8YVxbCiRANxaGKO89UL7o-24FmHHQDwCViXNQ@mail.gmail.com>
 <7176592d-4784-0797-4457-463773464536@oracle.com>
 <CAHPdc0m5o6jp+t5xFxCM5Y_KG8rQiubOXU_FE6mt7-+f46fE2w@mail.gmail.com>
 <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com>
 <CAHPdc0=1J6s0QUjkk0sYj+qGsyUcTfBHHyOU3w=9aWe=yO2hoA@mail.gmail.com>
Message-ID: <ab1a16a4-e1ef-d02e-c9bc-f242395f6c5a@oracle.com>

Hi Liu,

looks good and trivial to me. I'll sponsor.

Best regards,
Tobias

On 09.12.19 09:17, Liu Xin wrote:
> Hi, Tobias,?
> 
> Thanks for your feedback.?
> Here is the new webrev.? I update?what you pointed out.?
> https://cr.openjdk.java.net/~xliu/8235383/02/webrev/
> 
> The patch passed hotspot-tier1 for both fastdebug and release builds.?
> 
> thanks,
> --lx
> 
> 
> On Sun, Dec 8, 2019 at 11:15 PM Tobias Hartmann <tobias.hartmann at oracle.com
> <mailto:tobias.hartmann at oracle.com>> wrote:
> 
>     Hi Liu,
> 
>     thanks for adding the test.
> 
>     We try to avoid bug ids as test names. In this case, I would suggest something like
>     TestPrintIRDuringConstruction with a compiler.c1 package declaration. Also, the first line of the
>     copyright header looks wrong (should look like this [1]).
> 
>     line 27-29: I don't think you need these lines
>     line 42: "initiail" -> "initial"
> 
>     Thanks,
>     Tobias
> 
>     [1]
>     http://hg.openjdk.java.net/jdk/jdk/raw-file/fb39a8d1d101/test/hotspot/jtreg/compiler/c1/TestGotoIfMain.java
> 
>     On 07.12.19 05:26, Liu Xin wrote:
>     > hi, Tobias,?
>     >
>     > Thank you for reviewing it. I add a regression test about it. Could you take a look?
>     > https://cr.openjdk.java.net/~xliu/8235383/01/webrev/
>     >
>     > thanks,
>     >
>     > --lx
>     >
>     >
>     >
>     > On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann <tobias.hartmann at oracle.com
>     <mailto:tobias.hartmann at oracle.com>
>     > <mailto:tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>>> wrote:
>     >
>     >? ? ?Hi Liu,
>     >
>     >? ? ?your fix looks good to me but could you please add a regression test?
>     >
>     >? ? ?Thanks,
>     >? ? ?Tobias
>     >
>     >? ? ?On 06.12.19 08:23, Liu Xin wrote:
>     >? ? ?> Hi, Reviewers,
>     >? ? ?>
>     >? ? ?> Could you review this very simple bugfix for C1?
>     >? ? ?> JBS: https://bugs.openjdk.java.net/browse/JDK-8235383
>     >? ? ?> Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/
>     >? ? ?>
>     >? ? ?> The root cause is some instructions are going to be eliminated, so they are
>     >? ? ?> not assigned to any valid bci.
>     >? ? ?> In present of? -XX:+PrintIRDuringConstruction -XX:+Verbose,? C1 will print
>     >? ? ?> them out and then hit the assert.
>     >? ? ?>
>     >? ? ?> Yes, I can twiddle graph_builder to assign right BCIs to them,? but I would
>     >? ? ?> like to have a more robust InstructionPrinter::print_line. the CR will
>     >? ? ?> leave blanks in the position of bci.
>     >? ? ?> Eliminated store for object 0:
>     >? ? ?> .? ? ? 0? ? a67? ? a58._24 := a54 (L) next
>     >? ? ?> Eliminated load:
>     >? ? ?> .? ? ? 0? ? i35? ? a11._24 (I) position
>     >? ? ?>
>     >? ? ?> thanks,
>     >? ? ?> --lx
>     >? ? ?>
>     >
> 

From christian.hagedorn at oracle.com  Tue Dec 10 09:38:03 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 10 Dec 2019 10:38:03 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <b75a104c-a8be-37a7-a26f-b07aef5dde6b@oracle.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
 <b75a104c-a8be-37a7-a26f-b07aef5dde6b@oracle.com>
Message-ID: <daac8879-8a6a-85a9-2862-c7d2fce41699@oracle.com>

Thank you Tobias for your review!

Best regards,
Christian

On 10.12.19 10:29, Tobias Hartmann wrote:
> Hi Christian,
> 
> as we've discussed offline, this looks good to me. Although the change is quite complex, I think we
> should target JDK 14 for the reasons that Vladimir already brought up. Also, the entry->outcnt() > 1
> condition makes sure this code is only executed in very specific/rare cases.
> 
> Nevertheless, it would be good to get more reviews.
> 
> Best regards,
> Tobias
> 
> On 28.11.19 10:24, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8233033
>> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/
>>
>> The C2 compiled code produces a wrong result for 'iFld' in the test case. It is -8 instead of -7.
>> The loop in the test case is partially peeled and then unswitched. The wrong result is produced
>> because a wrong state is transferred to the interpreter when an uncommon trap is hit in the C2
>> compiled code in the fast version of the unswitched loop.
>>
>> The problem is when unswitching the loop, we clone the original loop predicates for the slow and
>> fast version of the loop [1] but we do not account for partially peeled statements that are control
>> dependent on the loop predicates (i.e. need to be executed after the predicates). As a result, these
>> are executed before the cloned loop predicates.
>>
>> The situation of the test case method PartialPeelingUnswitch::test() is visualized in [2]. IfTrue
>> 118, the entry control of the original loop which follows right after the loop predicates, has an
>> output edge to the StoreI 353 node. This node belongs to the "iFld += -7" statement which was
>> partially peeled before. When creating the slow version of the loop and cloning the predicates in
>> PhaseIdealLoop::create_slow_version_of_loop(), this control dependency is lost. StoreI 353 is still
>> dependent on IfTrue 118 instead of IfTrue 472 (fast loop entry control) and IfTrue 476 (slow loop
>> entry control). The original loop predicates are later removed and thus, when hitting the uncommon
>> trap in the fast loop, we accidentally executed "iFld += -7" (StoreI 353) already even though the
>> interpreter assumes C2 has not executed any statements in the loop. As a result, "iFld += -7" is
>> executed twice in a row which produces a wrong result.
>>
>> The fix is to replace the control input of all statements that have a control input from the
>> original loop entry control (and are not the "loop selection" IfNode) with the fast and slow entry
>> control, respectively. Since the statements cannot have two control inputs they need to be cloned
>> together with all following nodes on a path to the loop phi nodes. The output of the last node
>> before a loop phi on such a path needs to be adjusted to only point to the phi node belonging to the
>> fast loop. The last node on the cloned path is set to the phi node belonging to the slow loop. The
>> fix is visualized in [3]. The control input of StoreI 353 is now the entry control of the fast loop
>> (IfTrue 472) and its output only points to the corresponding Phi 442 of the fast loop. The same was
>> done for the cloned node StoreI 476 of StoreI 353 for the slow loop.
>>
>> This bug can also be reproduced with JDK 11. Should we target this fix to 14 or defer it to 15
>> (since it's a more complex one)?
>>
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1] http://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/opto/loopUnswitch.cpp#l272
>> [2] https://bugs.openjdk.java.net/secure/attachment/85593/wrong_dependencies.png
>> [3] https://bugs.openjdk.java.net/secure/attachment/85592/fixed_dependencies.png

From martin.doerr at sap.com  Tue Dec 10 09:43:06 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 10 Dec 2019 09:43:06 +0000
Subject: RFR(XS): 8223968: Add abort type description to RTM statistic
 counters
In-Reply-To: <3aefe4b7-1be7-64cd-f27c-1db9e9bb29b7@oracle.com>
References: <d942d34d-4ef2-a30f-c688-a37ad624515a@linux.vnet.ibm.com>
 <AM0PR0202MB3297C7F8E5CEBAE755DF83589A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com>
 <3aefe4b7-1be7-64cd-f27c-1db9e9bb29b7@oracle.com>
Message-ID: <AM0PR0202MB3297430619CA4037DD7B61DF9A5B0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

+1

Thanks,
Martin

> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Dienstag, 10. Dezember 2019 01:53
> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Doerr, Martin
> <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8223968: Add abort type description to RTM statistic
> counters
> 
> v2 looks good to me.
> 
> Thanks,
> Vladimir
> 
> On 12/9/19 2:05 PM, Gustavo Romero wrote:
> > Hi Martin,
> >
> > On 12/09/2019 11:06 AM, Doerr, Martin wrote:
> >> nice improvement!
> >
> > =)
> >
> > Thanks for the quick review!
> >
> >
> >> I'd only make the message array static:
> >> static const char* _abortX_desc[ABORT_STATUS_LIMIT];
> >>
> >> +const char*
> RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = {
> >> +? "abort instruction?? ",
> >> +? "may succeed on retry",
> >> +? "thread conflict???? ",
> >> +? "buffer overflow???? ",
> >> +? "debug or trap hit?? ",
> >> +? "maximum nested depth"
> >> +};
> >
> > oh... sure. Done.
> >
> >
> >> Please update the copyright in the test.
> >
> > Done.
> >
> >
> > v2:
> >
> > http://cr.openjdk.java.net/~gromero/8223968/v2/
> >
> >
> > Best regards,
> > Gustavo

From goetz.lindenmaier at sap.com  Tue Dec 10 09:54:54 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 10 Dec 2019 09:54:54 +0000
Subject: Please open 8152988: [AOT] Update test batch definitions to
 include aot-ed java.base module mode into hs-comp testing
In-Reply-To: <75d120c2-19ea-a180-3680-b884195dccbd@oracle.com>
References: <AM6PR02MB53479EB3A6378F4C1EFC9FADEC580@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <AM6PR02MB5347B17B2FD1586BB7356918EC5B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <75d120c2-19ea-a180-3680-b884195dccbd@oracle.com>
Message-ID: <AM6PR02MB53471B5A6AE86F17CEB16FB1EC5B0@AM6PR02MB5347.eurprd02.prod.outlook.com>

That's great, thanks!

Best regards,
 Goetz.

> -----Original Message-----
> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> Sent: Dienstag, 10. Dezember 2019 09:45
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot compiler
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: Please open 8152988: [AOT] Update test batch definitions to
> include aot-ed java.base module mode into hs-comp testing
> 
> Hi Goetz,
> 
> done.
> 
> Best regards,
> Tobias
> 
> On 10.12.19 08:39, Lindenmaier, Goetz wrote:
> > Hi,
> >
> > I would like to downport this change.
> > It would be great if the bug could be opened up.
> >
> > http://hg.openjdk.java.net/jdk/jdk/rev/ff10f8f3a583
> > https://bugs.openjdk.java.net/browse/JDK-8152988
> >
> > Thanks and best regards,
> >   Goetz.
> >

From tobias.hartmann at oracle.com  Tue Dec 10 10:31:00 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 10 Dec 2019 11:31:00 +0100
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <87eexca7sc.fsf@redhat.com>
References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com>
 <VI1PR0202MB3312B7BD20D3FC5DAD7713949A420@VI1PR0202MB3312.eurprd02.prod.outlook.com>
 <87wob5a8gc.fsf@redhat.com> <87eexca7sc.fsf@redhat.com>
Message-ID: <830aca53-29db-23f5-1853-cd1974ac9d60@oracle.com>

Hi Roland,

looks good to me.

Best regards,
Tobias

On 10.12.19 09:45, Roland Westrelin wrote:
> 
> New webrev with Martin's suggestion:
> 
> http://cr.openjdk.java.net/~roland/8234350/webrev.01/
> 
> Roland.
> 

From rwestrel at redhat.com  Tue Dec 10 10:46:56 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 10 Dec 2019 11:46:56 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
Message-ID: <87blsga25b.fsf@redhat.com>


Hi Christian,

> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/

Is this correct?

364         set_idom(stmt, iffast_pred, dom_depth(stmt));
366         set_idom(cloned_stmt, ifslow_pred, dom_depth(stmt));

stmt is a non CFG so it doesn't have an idom but a control?

Roland.


From christian.hagedorn at oracle.com  Tue Dec 10 11:43:29 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 10 Dec 2019 12:43:29 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <87blsga25b.fsf@redhat.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
 <87blsga25b.fsf@redhat.com>
Message-ID: <e915e625-b8af-a14b-cbde-b0530b3adb9c@oracle.com>

Hi Roland

You're right, it should be set_ctrl() instead. I changed it and added an 
additional non CFG sanity assertion check:
http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/

Best regards,
Christian

On 10.12.19 11:46, Roland Westrelin wrote:
> 
> Hi Christian,
> 
>> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/
> 
> Is this correct?
> 
> 364         set_idom(stmt, iffast_pred, dom_depth(stmt));
> 366         set_idom(cloned_stmt, ifslow_pred, dom_depth(stmt));
> 
> stmt is a non CFG so it doesn't have an idom but a control?
> 
> Roland.
> 

From rwestrel at redhat.com  Tue Dec 10 12:25:23 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 10 Dec 2019 13:25:23 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <e915e625-b8af-a14b-cbde-b0530b3adb9c@oracle.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
 <87blsga25b.fsf@redhat.com> <e915e625-b8af-a14b-cbde-b0530b3adb9c@oracle.com>
Message-ID: <878snk9xl8.fsf@redhat.com>


> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/

Looks good to me.

Roland.


From nils.eliasson at oracle.com  Tue Dec 10 14:00:44 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 10 Dec 2019 15:00:44 +0100
Subject: RFR(S): Clean-up BarrierSetC2
Message-ID: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com>

Hi,

I can across a lot of dead code in BarrierSetC2 while debugging a 
completely separate problem. This is leftovers from legacy barrier 
implementations that no longer is needed.

This is a quick cleanup where non-used parts are removed. I tried to 
keep it as simple as possible to avoid introducing any bugs. There are 
additional cleanups that can be done in these files later.

Bug: https://bugs.openjdk.java.net/browse/JDK-8235653

Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01


Please review,

Nils Eliasson


From aph at redhat.com  Tue Dec 10 14:05:25 2019
From: aph at redhat.com (Andrew Haley)
Date: Tue, 10 Dec 2019 14:05:25 +0000
Subject: [aarch64-port-dev ] crash due to long offset
In-Reply-To: <DB7PR08MB3115E3198230B3F96C9EAB9A965B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <2d65a25f-8e38-4da4-a6ef-b4ff9ba847fe.zhuoren.wz@alibaba-inc.com>
 <DB7PR08MB3115E3198230B3F96C9EAB9A965B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <26ad8b0f-95a6-f126-7ea1-aa59291145d4@redhat.com>

On 12/10/19 8:28 AM, Pengfei Li (Arm Technology China) wrote:
> Hi Zhuoren,
> 
>> I also wrote a patch to solve this issue, please also review.
>> http://cr.openjdk.java.net/~wzhuo/BigOffsetAarch64/webrev.00/jdk13u.pat
>> ch
> Thanks for your patch. I (NOT a reviewer) eyeballed your fix and found a probable mistake.
> 
> In  "enc_class aarch64_enc_str(iRegL src, memory mem) %{ ... %}", you have "if (($mem$$index == -1) && ($mem$$disp > 0)& (($mem$$disp & 0x7) != 0) && ($mem$$disp > 255))".
> Should it be "&&" instead of "&" in the middle?
> 
> Another question: Is it possible to add the logic into loadStore() or another new function instead of duplicating it everywhere in aarch64.ad?
> 
> I've also CC'ed this to hotspot-compiler-dev because all hotspot compiler patches (including AArch64 specific) should go through it for review.

I'm looking at this.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From claes.redestad at oracle.com  Tue Dec 10 14:13:31 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 10 Dec 2019 15:13:31 +0100
Subject: RFR(S): Clean-up BarrierSetC2
In-Reply-To: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com>
References: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com>
Message-ID: <3aedf4b7-2987-6d8b-f308-d922d3160df9@oracle.com>

Looks good!

/Claes

On 2019-12-10 15:00, Nils Eliasson wrote:
> Hi,
> 
> I can across a lot of dead code in BarrierSetC2 while debugging a 
> completely separate problem. This is leftovers from legacy barrier 
> implementations that no longer is needed.
> 
> This is a quick cleanup where non-used parts are removed. I tried to 
> keep it as simple as possible to avoid introducing any bugs. There are 
> additional cleanups that can be done in these files later.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8235653
> 
> Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01
> 
> 
> Please review,
> 
> Nils Eliasson
> 

From martin.doerr at sap.com  Tue Dec 10 15:29:49 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 10 Dec 2019 15:29:49 +0000
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <830aca53-29db-23f5-1853-cd1974ac9d60@oracle.com>
References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com>
 <VI1PR0202MB3312B7BD20D3FC5DAD7713949A420@VI1PR0202MB3312.eurprd02.prod.outlook.com>
 <87wob5a8gc.fsf@redhat.com> <87eexca7sc.fsf@redhat.com>
 <830aca53-29db-23f5-1853-cd1974ac9d60@oracle.com>
Message-ID: <AM0PR0202MB3297997B3172A131133D72C59A5B0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

+1

It's better readable this way.

Thanks,
Martin


> -----Original Message-----
> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> Sent: Dienstag, 10. Dezember 2019 11:31
> To: Roland Westrelin <rwestrel at redhat.com>; Doerr, Martin
> <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8234350: assert(mode == ControlAroundStripMined &&
> (use == sfpt || !use->is_reachable_from_root())) failed: missed a node
> 
> Hi Roland,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 10.12.19 09:45, Roland Westrelin wrote:
> >
> > New webrev with Martin's suggestion:
> >
> > http://cr.openjdk.java.net/~roland/8234350/webrev.01/
> >
> > Roland.
> >

From rkennke at redhat.com  Tue Dec 10 15:37:00 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 10 Dec 2019 16:37:00 +0100
Subject: RFR(S): Clean-up BarrierSetC2
In-Reply-To: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com>
References: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com>
Message-ID: <f9e2b8c9-79c1-fea8-5583-a10b68e4d9be@redhat.com>

Looks good to me, too (lots of this stuff was for Shenandoah pre-LRB).

Thanks,
Roman

> Hi,
> 
> I can across a lot of dead code in BarrierSetC2 while debugging a
> completely separate problem. This is leftovers from legacy barrier
> implementations that no longer is needed.
> 
> This is a quick cleanup where non-used parts are removed. I tried to
> keep it as simple as possible to avoid introducing any bugs. There are
> additional cleanups that can be done in these files later.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8235653
> 
> Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01
> 
> 
> Please review,
> 
> Nils Eliasson
> 


From martin.doerr at sap.com  Tue Dec 10 16:14:58 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 10 Dec 2019 16:14:58 +0000
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com>
 <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com>
 <AM0PR0202MB3297F9E8133691AE9D6B31469A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com>
Message-ID: <AM0PR0202MB3297610FE0E90A6C924EC0939A5B0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi Claes,

I certainly like to have the improvement for C2.
But I can't investigate impact on C1 so quickly. It may be true that increasing inlining depth is not as problematic as increasing MaxInlineSize for C1.
So wrt. this issue I abstain.

To make sure what we discussed won't get forgotten, I've created
https://bugs.openjdk.java.net/browse/JDK-8235673

Feel free to edit or comment.

Best regards,
Martin


> -----Original Message-----
> From: Claes Redestad <claes.redestad at oracle.com>
> Sent: Montag, 9. Dezember 2019 13:44
> To: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; hotspot compiler <hotspot-compiler-
> dev at openjdk.java.net>
> Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel
> 
> Hi,
> 
> Nils raised this issue in the bug, and while I think it's a fair point I
> think it's orthogonal to whether or not we can/should tune the default
> here and now. I think the data we have speaks in favor of doing this
> tuning for JDK 14, and then re-evaluate with more real-world data for
> JDK 15, possibly dialing back the C1 defaults.
> 
> When implementing such flags we can also evaluate if C1 should be even
> more conservative than it is today. It's also worth thinking about
> whether or not we should introduce different settings for C1 level 1, 2
> and 3..
> 
> /Claes
> 
> On 2019-12-09 12:16, Doerr, Martin wrote:
> > Hi,
> >
> > I think tuning inlining makes sense for C2.
> >
> > The problem I see is that C1 uses the same inlining flags.
> > C1 doesn't have the concept of uncommon traps so it compiles all paths.
> > That's why I think this change is only good for C2.
> >
> > So I suggest separating C1 inlining flags before tuning them for C2 like in the
> example below (note that new flags require CSR).
> > Feedback for this idea is welcome.
> >
> > Best regards,
> > Martin
> >
> >
> >
> > diff -r 45fceff98bb5 src/hotspot/share/c1/c1_globals.hpp
> > --- a/src/hotspot/share/c1/c1_globals.hpp       Mon Dec 09 10:26:41 2019
> +0100
> > +++ b/src/hotspot/share/c1/c1_globals.hpp       Mon Dec 09 12:08:37 2019
> +0100
> > @@ -174,6 +174,12 @@
> >     develop_pd(bool, RoundFPResults,                                          \
> >             "Indicates whether rounding is needed for floating point results")\
> >                                                                               \
> > +  product(intx, C1MaxInlineSize, 35,                                        \
> > +          "The maximum bytecode size of a method to be inlined by C1")      \
> > +  product(intx, C1MaxInlineLevel, 9,                                        \
> > +          "The maximum number of nested calls that are inlined by C1")      \
> > +  product(intx, C1MaxRecursiveInlineLevel, 1,                               \
> > +          "maximum number of nested recursive calls that are inlined")      \
> >     develop(intx, NestedInliningSizeRatio, 90,                                \
> >             "Percentage of prev. allowed inline size in recursive inlining")  \
> >             range(0, 100)                                                     \
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: hotspot-compiler-dev <hotspot-compiler-dev-
> >> bounces at openjdk.java.net> On Behalf Of Claes Redestad
> >> Sent: Montag, 9. Dezember 2019 11:15
> >> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler
> >> <hotspot-compiler-dev at openjdk.java.net>
> >> Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel
> >>
> >> Thanks for the review!
> >>
> >> /Claes
> >>
> >> On 2019-12-09 01:52, Vladimir Kozlov wrote:
> >>> Nice finding! Good.
> >>>
> >>> Thanks,
> >>> Vladimir
> >>>
> >>> On 12/8/19 1:44 PM, Claes Redestad wrote:
> >>>> Hi,
> >>>>
> >>>> increasing MaxInlineLevel can substantially improve performance in
> some
> >>>> benchmarks[1], and has been reported to help applications
> implemented
> >> in
> >>>> scala in particular.
> >>>>
> >>>> There is always some risk of regressions when tweaking the default
> >>>> inlining settings. I've done a number of experiments to ascertain that
> >>>> the effect of increasing this on a wide array of benchmarks. With 15 all
> >>>> benchmarks tested are show either neutral or positive results, with no
> >>>> observed regression w.r.t. compilation speed or code cache usage.
> >>>>
> >>>> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
> >>>>
> >>>> Thanks!
> >>>>
> >>>> /Claes
> >>>>
> >>>> [1] One http://renaissance.dev sub-benchmark improve by almost 3x
> >> with
> >>>> an increase from 9 to 15.

From sandhya.viswanathan at intel.com  Tue Dec 10 16:57:12 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 10 Dec 2019 16:57:12 +0000
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <BEE49F54-A83F-4DF1-B31C-4C7DEBCED013@oracle.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
 <BEE49F54-A83F-4DF1-B31C-4C7DEBCED013@oracle.com>
Message-ID: <BYAPR11MB2983C33DBF32D1E59006CFC9EF5B0@BYAPR11MB2983.namprd11.prod.outlook.com>

Thanks a lot John.

Best Regards,
Sandhya

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of John Rose
Sent: Monday, December 09, 2019 5:27 PM
To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations

Reviewed by me also.  This is a better way to do vector stuff.

Thanks, Vladimir K, for the perceptive questions.

? John

From vladimir.x.ivanov at oracle.com  Tue Dec 10 17:00:32 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 10 Dec 2019 20:00:32 +0300
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
Message-ID: <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>

Thanks for the reviews, Vladimir and John.

> The rules for NegV[DF] and CMoveV[DF] that remain looks
> puzzling to me, and would benefit from a brief comment for
> future readers, explaining why they are there.
> 
> The NegV rules seem to be removable for the same reason
> (size limits) that the other rules are removable.  If there?s a

The checks are there to ensure AVX512DQ is present and vlen check limits 
it to 512-bit vectors:

       case Op_NegVF:
         if ((vlen == 16) && (VM_Version::supports_avx512dq() == false))
           ret_value = false;
         break;

       case Op_NegVD:
         if ((vlen == 8) && (VM_Version::supports_avx512dq() == false))
           ret_value = false;
         break;

Similar checks for byte operations are redundant, because 
Matcher::vector_width_in_bytes() explicitly checks for AVX512BW:

   if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
     size = (VM_Version::supports_avx512bw()) ? 64 : 32;


I can add a comment that AVX512DQ is required, but it doesn't look like 
an improvement since the checks say exactly the same.


> tricky reason they must be called out explicitly, it would be
> good to explain.  The CMoveV rules probably stem from limited
> support for CMove, but again a comment would be good.

Yes, there are only vcmov8F_reg and vcmov4D_reg present, so the checks 
are there to avoid matching failures. I'll put a comment that it's an 
implementation limitation.

Best regards,
Vladimir Ivanov

> On Dec 5, 2019, at 1:53 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/8234392/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234392
>>
>> Make basic type available in Matcher::match_rule_supported_vector() and refactor Matcher::match_rule_supported_vector() on x86.
>>
>> It enables significant simplification of Matcher::match_rule_supported_vector()on x86 by using Matcher::vector_size_supported() to cover cases like AXV2 is required for 256-bit operaions on integral (see Matcher::vector_width_in_bytes() [1] for details).
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Testing: tier1-4
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1442
> 

From vladimir.x.ivanov at oracle.com  Tue Dec 10 17:12:35 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 10 Dec 2019 20:12:35 +0300
Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different
 vector operations
In-Reply-To: <BEE49F54-A83F-4DF1-B31C-4C7DEBCED013@oracle.com>
References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com>
 <BEE49F54-A83F-4DF1-B31C-4C7DEBCED013@oracle.com>
Message-ID: <6e7d058f-ac92-45e8-8b6b-5213a55dce4c@oracle.com>

Thanks for reviews, Vladimir and John.

And thanks for handling the review and answering the questions, Sandhya.

Best regards,
Vladimir Ivanov

On 10.12.2019 04:27, John Rose wrote:
> Reviewed by me also.  This is a better way to do vector stuff.
> 
> Thanks, Vladimir K, for the perceptive questions.
> 
> ? John
> 

From bob.vandette at oracle.com  Tue Dec 10 17:22:28 2019
From: bob.vandette at oracle.com (Bob Vandette)
Date: Tue, 10 Dec 2019 12:22:28 -0500
Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks
 -XX:-EnableJVMCI
In-Reply-To: <dee69037-5094-5061-164c-0bee425d18ae@oracle.com>
References: <dee69037-5094-5061-164c-0bee425d18ae@oracle.com>
Message-ID: <6F62B34F-993A-44AE-BD6E-AB7CCB3F18BF@oracle.com>

Looks good to me.

Bob.

> On Dec 9, 2019, at 8:01 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> https://cr.openjdk.java.net/~kvn/8235539/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8235539
> 
> Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true.
> 
> Changes are prepared by Doug for GraalVM and adapted by me for JDK 14.
> 
> Passed tier1 where new test runs.
> 
> Thanks,
> Vladimir


From gromero at linux.vnet.ibm.com  Tue Dec 10 17:23:04 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 10 Dec 2019 14:23:04 -0300
Subject: RFR(XS): 8223968: Add abort type description to RTM statistic
 counters
In-Reply-To: <AM0PR0202MB3297430619CA4037DD7B61DF9A5B0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <d942d34d-4ef2-a30f-c688-a37ad624515a@linux.vnet.ibm.com>
 <AM0PR0202MB3297C7F8E5CEBAE755DF83589A580@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com>
 <3aefe4b7-1be7-64cd-f27c-1db9e9bb29b7@oracle.com>
 <AM0PR0202MB3297430619CA4037DD7B61DF9A5B0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <da665668-b393-4306-623b-b7c13c3fc9f1@linux.vnet.ibm.com>

Thanks a lot Martin and Vladimir for the reviews.

Pushed to jdk/jdk.

Best regards,
Gustavo

On 12/10/2019 06:43 AM, Doerr, Martin wrote:
> +1
> 
> Thanks,
> Martin
> 
>> -----Original Message-----
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Sent: Dienstag, 10. Dezember 2019 01:53
>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Doerr, Martin
>> <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR(XS): 8223968: Add abort type description to RTM statistic
>> counters
>>
>> v2 looks good to me.
>>
>> Thanks,
>> Vladimir
>>
>> On 12/9/19 2:05 PM, Gustavo Romero wrote:
>>> Hi Martin,
>>>
>>> On 12/09/2019 11:06 AM, Doerr, Martin wrote:
>>>> nice improvement!
>>>
>>> =)
>>>
>>> Thanks for the quick review!
>>>
>>>
>>>> I'd only make the message array static:
>>>> static const char* _abortX_desc[ABORT_STATUS_LIMIT];
>>>>
>>>> +const char*
>> RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = {
>>>> +? "abort instruction?? ",
>>>> +? "may succeed on retry",
>>>> +? "thread conflict???? ",
>>>> +? "buffer overflow???? ",
>>>> +? "debug or trap hit?? ",
>>>> +? "maximum nested depth"
>>>> +};
>>>
>>> oh... sure. Done.
>>>
>>>
>>>> Please update the copyright in the test.
>>>
>>> Done.
>>>
>>>
>>> v2:
>>>
>>> http://cr.openjdk.java.net/~gromero/8223968/v2/
>>>
>>>
>>> Best regards,
>>> Gustavo

From vladimir.kozlov at oracle.com  Tue Dec 10 17:35:08 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 10 Dec 2019 09:35:08 -0800
Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks
 -XX:-EnableJVMCI
In-Reply-To: <de92a511-ffa2-b282-d233-8141e27616d8@oracle.com>
References: <dee69037-5094-5061-164c-0bee425d18ae@oracle.com>
 <de92a511-ffa2-b282-d233-8141e27616d8@oracle.com>
Message-ID: <537cfe7f-8710-3ebe-2e15-c4751326cf79@oracle.com>

Thank you, Tobias

On 12/10/19 12:53 AM, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> looks good to me.
> 
> Please add a bug number to the test. Also, the Expectation.name/value/origin fields are not needed,
> right? No new webrev required.

I added @bug. But I would leave fields there to match code in GraalVM.

Thanks,
Vladimir

> 
> Best regards,
> Tobias
> 
> On 10.12.19 02:01, Vladimir Kozlov wrote:
>> https://cr.openjdk.java.net/~kvn/8235539/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8235539
>>
>> Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag
>> is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true.
>>
>> Changes are prepared by Doug for GraalVM and adapted by me for JDK 14.
>>
>> Passed tier1 where new test runs.
>>
>> Thanks,
>> Vladimir

From vladimir.kozlov at oracle.com  Tue Dec 10 17:35:42 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 10 Dec 2019 09:35:42 -0800
Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks
 -XX:-EnableJVMCI
In-Reply-To: <6F62B34F-993A-44AE-BD6E-AB7CCB3F18BF@oracle.com>
References: <dee69037-5094-5061-164c-0bee425d18ae@oracle.com>
 <6F62B34F-993A-44AE-BD6E-AB7CCB3F18BF@oracle.com>
Message-ID: <2c073a26-60cc-56e4-7820-72ebebc7c0f7@oracle.com>

Thank you, Bob

Vladimir

On 12/10/19 9:22 AM, Bob Vandette wrote:
> Looks good to me.
> 
> Bob.
> 
>> On Dec 9, 2019, at 8:01 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> https://cr.openjdk.java.net/~kvn/8235539/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8235539
>>
>> Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true.
>>
>> Changes are prepared by Doug for GraalVM and adapted by me for JDK 14.
>>
>> Passed tier1 where new test runs.
>>
>> Thanks,
>> Vladimir
> 

From navy.xliu at gmail.com  Tue Dec 10 18:19:10 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Tue, 10 Dec 2019 10:19:10 -0800
Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction
 -XX:+Verbose
In-Reply-To: <ab1a16a4-e1ef-d02e-c9bc-f242395f6c5a@oracle.com>
References: <CAHPdc0m+t9yOZ8YVxbCiRANxaGKO89UL7o-24FmHHQDwCViXNQ@mail.gmail.com>
 <7176592d-4784-0797-4457-463773464536@oracle.com>
 <CAHPdc0m5o6jp+t5xFxCM5Y_KG8rQiubOXU_FE6mt7-+f46fE2w@mail.gmail.com>
 <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com>
 <CAHPdc0=1J6s0QUjkk0sYj+qGsyUcTfBHHyOU3w=9aWe=yO2hoA@mail.gmail.com>
 <ab1a16a4-e1ef-d02e-c9bc-f242395f6c5a@oracle.com>
Message-ID: <CAHPdc0=p7FSxdMcND9TUcKb=To=mH9-TL7mVTwA5Z2QuHJgAZQ@mail.gmail.com>

hi, Tobias,
Great, thanks!

thanks,
--lx


On Tue, Dec 10, 2019 at 1:34 AM Tobias Hartmann <tobias.hartmann at oracle.com>
wrote:

> Hi Liu,
>
> looks good and trivial to me. I'll sponsor.
>
> Best regards,
> Tobias
>
> On 09.12.19 09:17, Liu Xin wrote:
> > Hi, Tobias,
> >
> > Thanks for your feedback.
> > Here is the new webrev.  I update what you pointed out.
> > https://cr.openjdk.java.net/~xliu/8235383/02/webrev/
> >
> > The patch passed hotspot-tier1 for both fastdebug and release builds.
> >
> > thanks,
> > --lx
> >
> >
> > On Sun, Dec 8, 2019 at 11:15 PM Tobias Hartmann <
> tobias.hartmann at oracle.com
> > <mailto:tobias.hartmann at oracle.com>> wrote:
> >
> >     Hi Liu,
> >
> >     thanks for adding the test.
> >
> >     We try to avoid bug ids as test names. In this case, I would suggest
> something like
> >     TestPrintIRDuringConstruction with a compiler.c1 package
> declaration. Also, the first line of the
> >     copyright header looks wrong (should look like this [1]).
> >
> >     line 27-29: I don't think you need these lines
> >     line 42: "initiail" -> "initial"
> >
> >     Thanks,
> >     Tobias
> >
> >     [1]
> >
> http://hg.openjdk.java.net/jdk/jdk/raw-file/fb39a8d1d101/test/hotspot/jtreg/compiler/c1/TestGotoIfMain.java
> >
> >     On 07.12.19 05:26, Liu Xin wrote:
> >     > hi, Tobias,
> >     >
> >     > Thank you for reviewing it. I add a regression test about it.
> Could you take a look?
> >     > https://cr.openjdk.java.net/~xliu/8235383/01/webrev/
> >     >
> >     > thanks,
> >     >
> >     > --lx
> >     >
> >     >
> >     >
> >     > On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann <
> tobias.hartmann at oracle.com
> >     <mailto:tobias.hartmann at oracle.com>
> >     > <mailto:tobias.hartmann at oracle.com <mailto:
> tobias.hartmann at oracle.com>>> wrote:
> >     >
> >     >     Hi Liu,
> >     >
> >     >     your fix looks good to me but could you please add a
> regression test?
> >     >
> >     >     Thanks,
> >     >     Tobias
> >     >
> >     >     On 06.12.19 08:23, Liu Xin wrote:
> >     >     > Hi, Reviewers,
> >     >     >
> >     >     > Could you review this very simple bugfix for C1?
> >     >     > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383
> >     >     > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/
> >     >     >
> >     >     > The root cause is some instructions are going to be
> eliminated, so they are
> >     >     > not assigned to any valid bci.
> >     >     > In present of  -XX:+PrintIRDuringConstruction -XX:+Verbose,
> C1 will print
> >     >     > them out and then hit the assert.
> >     >     >
> >     >     > Yes, I can twiddle graph_builder to assign right BCIs to
> them,  but I would
> >     >     > like to have a more robust InstructionPrinter::print_line.
> the CR will
> >     >     > leave blanks in the position of bci.
> >     >     > Eliminated store for object 0:
> >     >     > .      0    a67    a58._24 := a54 (L) next
> >     >     > Eliminated load:
> >     >     > .      0    i35    a11._24 (I) position
> >     >     >
> >     >     > thanks,
> >     >     > --lx
> >     >     >
> >     >
> >
>

From john.r.rose at oracle.com  Tue Dec 10 18:43:08 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 10 Dec 2019 10:43:08 -0800
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
 <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
Message-ID: <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>

On Dec 10, 2019, at 9:00 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> The checks are there to ensure AVX512DQ is present and vlen check limits it to 512-bit vectors:
> 
>      case Op_NegVF:
>        if ((vlen == 16) && (VM_Version::supports_avx512dq() == false))
>          ret_value = false;
>        break;
> 
>      case Op_NegVD:
>        if ((vlen == 8) && (VM_Version::supports_avx512dq() == false))
>          ret_value = false;
>        break;
> 
> Similar checks for byte operations are redundant, because Matcher::vector_width_in_bytes() explicitly checks for AVX512BW:
> 
>  if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
>    size = (VM_Version::supports_avx512bw()) ? 64 : 32;

It?s true that 512bw and 512dq are different branches of 512, so it
is permissible to detect them in different places.  But? it would
seem easier to follow to check both in the one place:

   if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
     size = (VM_Version::supports_avx512bw()) ? 64 : 32;
+  if (UseAVX > 2 && (bt == T_FLOAT || bt == T_DOUBLE))
+    size = (VM_Version::supports_avx512dq()) ? 64 : 32;

This is probably an oversimplification, so file it as BS (brain storming).
If such a consolidation of sizing logic is possible, it can be done as
a separate cleanup.

(Side observation:  Since Abs and Neg are treated together in the vabsnegf
macro-instruction, it seems odd that there would be special processing
of NegVF and not AbsVF also.  More cleanups to do?)

> I can add a comment that AVX512DQ is required, but it doesn't look like an improvement since the checks say exactly the same.

Maybe add a comment cross-referencing the two hunks of logic,
since they are not parallel in the sources?  (For whatever reasons;
it doesn?t matter?)  It?s important to be able to find all of the
vector-sizing logic so it can be analyzed together, even if it must
be distributed into different locations.

> tricky reason they must be called out explicitly, it would be
>> good to explain.  The CMoveV rules probably stem from limited
>> support for CMove, but again a comment would be good.
> 
> Yes, there are only vcmov8F_reg and vcmov4D_reg present, so the checks are there to avoid matching failures. I'll put a comment that it's an implementation limitation.
> 

Yes, comments.  The matcher logic is tricky, and the structure
of AVX512 is tricky, and comments would help navigate it all.

? John

From vladimir.x.ivanov at oracle.com  Tue Dec 10 19:35:31 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 10 Dec 2019 22:35:31 +0300
Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, and
 MulV nodes
Message-ID: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/all
https://bugs.openjdk.java.net/browse/JDK-8235688

Reduce the number of AD instructions needed to implement vector 
operations by merging existing ones. The patch covers the following node 
types:
   - AddV(B/S/I/L/F/D)
   - SubVB/.../SubVD
   - MulVB/.../MulVD

Individual patches:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/individual

As Jatin described, merging is applied only to AD instructions of 
similar shape. There are some more opportunities for reduction/merging 
left, but they are deliberately left out for future work.

The patch is derived from the inintial version of generic vector support 
[1]. Generic vector support was reviewed earlier [2].

Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html 


[2] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html

From vladimir.x.ivanov at oracle.com  Tue Dec 10 20:03:03 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 10 Dec 2019 23:03:03 +0300
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
 <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
 <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>
Message-ID: <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com>


>> Similar checks for byte operations are redundant, because 
>> Matcher::vector_width_in_bytes() explicitly checks for AVX512BW:
>>
>> ?if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
>> ???size = (VM_Version::supports_avx512bw()) ? 64 : 32;
> 
> It?s true that 512bw and 512dq are different branches of 512, so it
> is permissible to detect them in different places. ?But? it would
> seem easier to follow to check both in the one place:
> 
>  ? ?if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
>  ? ? ?size = (VM_Version::supports_avx512bw()) ? 64 : 32;
> +? if (UseAVX > 2 && (bt == T_FLOAT || bt == T_DOUBLE))
> + ? ?size = (VM_Version::supports_avx512dq()) ? 64 : 32;
> 
> This is probably an oversimplification, so file it as BS (brain storming).
> If such a consolidation of sizing logic is possible, it can be done as
> a separate cleanup.

Unfortunately, single- and double-precision FP operations are scattered 
between AVX512F and AVX512DQ, so I doubt it makes sense to limit FP 
operations to 256-bit vectors when AVX512DQ is not available.

It would severely penalize Xeon Phis (which lack BW, DQ, and VL 
extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) 
can be chosen as the baseline.

Anyway, I got your idea and it makes perfect sense to me to collect such 
ideas.

> (Side observation: ?Since Abs and Neg are treated together in the?vabsnegf
> macro-instruction, it seems odd that there would be special processing
> of NegVF and not AbsVF also. ?More cleanups to do?)

Yes, it's a first round and there are other opportunities left. More 
will come after the patches are upstreamed.

>> I can add a comment that AVX512DQ is required, but it doesn't look 
>> like an improvement since the checks say exactly the same.
> 
> Maybe add a comment cross-referencing the two hunks of logic,
> since they are not parallel in the sources? ?(For whatever reasons;
> it doesn?t matter?) ?It?s important to be able to find all of the
> vector-sizing logic so it can be analyzed together, even if it must
> be distributed into different locations.
> 
>> tricky reason they must be called out explicitly, it would be
>>> good to explain. ?The CMoveV rules probably stem from limited
>>> support for CMove, but again a comment would be good.
>>
>> Yes, there are only vcmov8F_reg and vcmov4D_reg present, so the checks 
>> are there to avoid matching failures. I'll put a comment that it's an 
>> implementation limitation.
>>
> 
> Yes, comments. ?The matcher logic is tricky, and the structure
> of AVX512 is tricky, and comments would help navigate it all.

Yes, the complexity (very) quickly adds up... (Or "multiplies up"?..)

Best regards,
Vladimir Ivanov

From ekaterina.pavlova at oracle.com  Tue Dec 10 20:08:38 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Tue, 10 Dec 2019 12:08:38 -0800
Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1
In-Reply-To: <A991B777-A769-4439-8955-CE7752F7300D@oracle.com>
References: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>
 <E11D3454-18ED-4615-9067-A194E6153E70@oracle.com>
 <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com>
 <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com>
 <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com>
 <A991B777-A769-4439-8955-CE7752F7300D@oracle.com>
Message-ID: <c960a836-becf-2f3a-a1fe-467536bd0699@oracle.com>

Igor,
please see new webrev here:
  http://cr.openjdk.java.net/~epavlova//8215728/webrev.01/

thanks,
-katya

On 11/13/19 8:59 PM, Igor Ignatyev wrote:
> if GraalUnitTestLauncher has to be run w/ -XX:+EnableJVMCI, I guess we just have to switch back to othervm mode.
> -- Igor
> 
>> On Nov 13, 2019, at 8:52 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> -XX:+EnableJVMCI is already passed to the spawn JVM by GraalUnitTestLauncher.
>> However -XX:+EnableJVMCI is also required for GraalUnitTestLauncher itself so
>> getModuleExports() function works properly for graal modules.
>> Also note that we can't pass '-XX:+EnableJVMCI' to GraalUnitTestLauncher in jtreg
>> directive as we use '@run main compiler.graalunit.common.GraalUnitTestLauncher' to launch it.
>> See also discussion regarding this issue in JDK-8216551.
>>
>> Anyway, I understand the point regarding tier1 and will see what can be done.
>>
>> thanks,
>> -katya
>>
>> On 11/13/19 1:45 PM, Igor Ignatyev wrote:
>>> all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group.
>>> -- Igor
>>>> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>
>>>> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group.
>>>>
>>>> On 11/13/19 1:30 PM, Igor Ignatyev wrote:
>>>>> Hi Katya,
>>>>> shouldn't this group be also into tier1_compiler group?
>>>>> -- Igor
>>>>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it
>>>>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on
>>>>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of
>>>>>> time to execute.
>>>>>>
>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8215728
>>>>>>   webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html
>>>>>> testing: tier1
>>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> -katya
>>>>
>>
> 


From tom.rodriguez at oracle.com  Tue Dec 10 20:24:46 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 10 Dec 2019 12:24:46 -0800
Subject: RFR(S) 8229961: Assert failure in compiler/graalunit/HotspotTest.java
Message-ID: <cc527153-59c0-c87b-ea9e-fd9d994e2d7c@oracle.com>

http://cr.openjdk.java.net/~never/8229961/webrev
https://bugs.openjdk.java.net/browse/JDK-8229961

The JVMCI InstalledCode object maintains a link back to the CodeBlob 
it's associated with.  In several places JVMCI extracts the CodeBlob* or 
nmethod* from the InstalledCode and the operates on it.  In most other 
cases where an nmethod is being examined there are external factors 
keeping it alive but in this case there aren't.  The actions of the 
concurrent sweeper can transition or potentially free the nmethod while 
it's being used.  Getting all the way to freeing would take a very 
adversarial schedule but JVMCI should provide more safety for the these 
code paths.

This fix adds an nmethodLocker to these usages and through careful use 
of locks attempts to ensure that the resulting locked nmethod is alive 
and locked when it's returned.  I had to modify some of the jtreg tests 
because they were badly abusing the InstalledCode API and violating some 
assumptions about the relationship between nmethods and their wrapper 
InstalledCode objects.

Testing was clean apart from a test compilation problem and existing 
issue.  I'm resubmitting with the fixed jtreg test.

From john.r.rose at oracle.com  Tue Dec 10 20:32:26 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 10 Dec 2019 12:32:26 -0800
Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, 
 and MulV nodes
In-Reply-To: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>
References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>
Message-ID: <273D2CF0-F8F5-4542-8770-22EF728529C4@oracle.com>

Reviewed.

I concur with the occasional simplification of the printed formats.

(The combined diff hunks are epic diff salad; such is life.
Despite all that it reads OK in Emacs diff-auto-refine-mode.
And the individual diffs are very helpful as a backup,
especially for byte multiply.)

? John

> On Dec 10, 2019, at 11:35 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/all
> https://bugs.openjdk.java.net/browse/JDK-8235688
> 
> Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following node types:
>  - AddV(B/S/I/L/F/D)
>  - SubVB/.../SubVD
>  - MulVB/.../MulVD
> 
> Individual patches:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/individual

Typo somewhere:
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/indiviual/
> 
> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
> 
> The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier [2].
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html 
> 
> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html


From john.r.rose at oracle.com  Tue Dec 10 20:34:39 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 10 Dec 2019 12:34:39 -0800
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
 <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
 <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>
 <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com>
Message-ID: <AD2632F1-8C13-4D53-B274-C6E8B1A57D9D@oracle.com>

On Dec 10, 2019, at 12:03 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
>> This is probably an oversimplification, so file it as BS (brain storming).
>> If such a consolidation of sizing logic is possible, it can be done as
>> a separate cleanup.
> 
> Unfortunately, single- and double-precision FP operations are scattered between AVX512F and AVX512DQ, so I doubt it makes sense to limit FP operations to 256-bit vectors when AVX512DQ is not available.
> 
> It would severely penalize Xeon Phis (which lack BW, DQ, and VL extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) can be chosen as the baseline.

Yeah, I saw that coming after I visited the trusty intrinsics guide
https://software.intel.com/sites/landingpage/IntrinsicsGuide/
> 
> Anyway, I got your idea and it makes perfect sense to me to collect such ideas.

Thanks!  Next review, please? 


From igor.ignatyev at oracle.com  Tue Dec 10 21:33:13 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 10 Dec 2019 13:33:13 -0800
Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1
In-Reply-To: <c960a836-becf-2f3a-a1fe-467536bd0699@oracle.com>
References: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>
 <E11D3454-18ED-4615-9067-A194E6153E70@oracle.com>
 <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com>
 <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com>
 <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com>
 <A991B777-A769-4439-8955-CE7752F7300D@oracle.com>
 <c960a836-becf-2f3a-a1fe-467536bd0699@oracle.com>
Message-ID: <15931779-635A-452F-810D-C4A86FB79B5B@oracle.com>

LGTM.

-- Igor

> On Dec 10, 2019, at 12:08 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> Igor,
> please see new webrev here:
> http://cr.openjdk.java.net/~epavlova//8215728/webrev.01/
> 
> thanks,
> -katya
> 
> On 11/13/19 8:59 PM, Igor Ignatyev wrote:
>> if GraalUnitTestLauncher has to be run w/ -XX:+EnableJVMCI, I guess we just have to switch back to othervm mode.
>> -- Igor
>>> On Nov 13, 2019, at 8:52 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>> 
>>> -XX:+EnableJVMCI is already passed to the spawn JVM by GraalUnitTestLauncher.
>>> However -XX:+EnableJVMCI is also required for GraalUnitTestLauncher itself so
>>> getModuleExports() function works properly for graal modules.
>>> Also note that we can't pass '-XX:+EnableJVMCI' to GraalUnitTestLauncher in jtreg
>>> directive as we use '@run main compiler.graalunit.common.GraalUnitTestLauncher' to launch it.
>>> See also discussion regarding this issue in JDK-8216551.
>>> 
>>> Anyway, I understand the point regarding tier1 and will see what can be done.
>>> 
>>> thanks,
>>> -katya
>>> 
>>> On 11/13/19 1:45 PM, Igor Ignatyev wrote:
>>>> all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group.
>>>> -- Igor
>>>>> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>> 
>>>>> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group.
>>>>> 
>>>>> On 11/13/19 1:30 PM, Igor Ignatyev wrote:
>>>>>> Hi Katya,
>>>>>> shouldn't this group be also into tier1_compiler group?
>>>>>> -- Igor
>>>>>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it
>>>>>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on
>>>>>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of
>>>>>>> time to execute.
>>>>>>> 
>>>>>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8215728
>>>>>>>  webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html
>>>>>>> testing: tier1
>>>>>>> 
>>>>>>> 
>>>>>>> thanks,
>>>>>>> -katya
>>>>> 
>>> 
> 


From richard.reingruber at sap.com  Tue Dec 10 21:45:28 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Tue, 10 Dec 2019 21:45:28 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the
 Presence of JVMTI Agents
Message-ID: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hi,

I would like to get reviews please for

http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/

Corresponding RFE:
https://bugs.openjdk.java.net/browse/JDK-8227745

Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]

Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
change is being tested at SAP since I posted the first RFR some months ago.

The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
agents request capabilities that allow them to access local variable values. E.g. if you start-up
with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
from the beginning, well before a debugger attaches -- if ever one should do so. With the
enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
you'll find more details.

Thanks,
Richard.

[1] Experimental fix for JDK-8214584 based on JDK-8227745
    http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch

From vladimir.kozlov at oracle.com  Tue Dec 10 21:56:04 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 10 Dec 2019 13:56:04 -0800
Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, 
 and MulV nodes
In-Reply-To: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>
References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>
Message-ID: <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com>

On 12/10/19 11:35 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/all
> https://bugs.openjdk.java.net/browse/JDK-8235688
> 
> Reduce the number of AD instructions needed to implement vector 
> operations by merging existing ones. The patch covers the following node 
> types:
>  ? - AddV(B/S/I/L/F/D)
>  ? - SubVB/.../SubVD
>  ? - MulVB/.../MulVD
> 
> Individual patches:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/individual

The link does not work because directory name on server is 'indiviual'

There is inconsistency in method names and comments string in some cases 
(I expects that number '2' is removed from names as in most cases):

   instruct vadd2L(vec dst, vec src) %{
...
     format %{ "paddq   $dst,$src\t! add packed2L" %}

     format %{ "addpd   $dst,$src\t! add packed2D" %}

     format %{ "psubq   $dst,$src\t! sub packed2L" %}

     format %{ "subpd   $dst,$src\t! sub packed2D" %}

     format %{ "vsubpd  $dst,$src1,$src2\t! sub packed2D" %}

     format %{ "vsubpd  $dst,$src,$mem\t! sub packed2D" %}

     format %{ "mulpd   $dst,$src\t! mul packed2D" %}

The rest seems fine. I agree with short formats in webrev.12.vmul_byte.

Thanks,
Vladimir

> 
> As Jatin described, merging is applied only to AD instructions of 
> similar shape. There are some more opportunities for reduction/merging 
> left, but they are deliberately left out for future work.
> 
> The patch is derived from the inintial version of generic vector support 
> [1]. Generic vector support was reviewed earlier [2].
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html 
> 
> 
> [2] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html 
> 

From vladimir.kozlov at oracle.com  Tue Dec 10 22:14:21 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 10 Dec 2019 14:14:21 -0800
Subject: RFR(S) 8229961: Assert failure in
 compiler/graalunit/HotspotTest.java
In-Reply-To: <cc527153-59c0-c87b-ea9e-fd9d994e2d7c@oracle.com>
References: <cc527153-59c0-c87b-ea9e-fd9d994e2d7c@oracle.com>
Message-ID: <d84012af-c79e-4ea0-cd29-f2bfb34d62bf@oracle.com>

Looks good. Thank you for fixing JVMCI tests!

Vladimir

On 12/10/19 12:24 PM, Tom Rodriguez wrote:
> http://cr.openjdk.java.net/~never/8229961/webrev
> https://bugs.openjdk.java.net/browse/JDK-8229961
> 
> The JVMCI InstalledCode object maintains a link back to the CodeBlob 
> it's associated with.? In several places JVMCI extracts the CodeBlob* or 
> nmethod* from the InstalledCode and the operates on it.? In most other 
> cases where an nmethod is being examined there are external factors 
> keeping it alive but in this case there aren't.? The actions of the 
> concurrent sweeper can transition or potentially free the nmethod while 
> it's being used.? Getting all the way to freeing would take a very 
> adversarial schedule but JVMCI should provide more safety for the these 
> code paths.
> 
> This fix adds an nmethodLocker to these usages and through careful use 
> of locks attempts to ensure that the resulting locked nmethod is alive 
> and locked when it's returned.? I had to modify some of the jtreg tests 
> because they were badly abusing the InstalledCode API and violating some 
> assumptions about the relationship between nmethods and their wrapper 
> InstalledCode objects.
> 
> Testing was clean apart from a test compilation problem and existing 
> issue.? I'm resubmitting with the fixed jtreg test.

From john.r.rose at oracle.com  Tue Dec 10 22:17:07 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 10 Dec 2019 14:17:07 -0800
Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, 
 and MulV nodes
In-Reply-To: <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com>
References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>
 <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com>
Message-ID: <8B6F7BDE-DC8A-49AA-BB5A-0CABAF8F9D5C@oracle.com>

Nice catch!

> On Dec 10, 2019, at 1:56 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> There is inconsistency in method names and comments string in some cases (I expects that number '2' is removed from names as in most cases):


From vladimir.x.ivanov at oracle.com  Tue Dec 10 22:29:01 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 01:29:01 +0300
Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV, AbsV,
 and NegV nodes
Message-ID: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all
https://bugs.openjdk.java.net/browse/JDK-8235719

Merge AD instructions for the following vector nodes:
   - LShiftV*, RShiftV*, URShiftV*
   - AbsV*
   - NegV*

Individual patches:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual

As Jatin described, merging is applied only to AD instructions of 
similar shape. There are some more opportunities for reduction/merging 
left, but they are deliberately left out for future work.

The patch is derived from the initial version of generic vector support 
[1]. Generic vector support was reviewed earlier [2].

Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html

[2] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html

From vladimir.x.ivanov at oracle.com  Tue Dec 10 22:35:51 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 01:35:51 +0300
Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, 
 and MulV nodes
In-Reply-To: <8B6F7BDE-DC8A-49AA-BB5A-0CABAF8F9D5C@oracle.com>
References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>
 <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com>
 <8B6F7BDE-DC8A-49AA-BB5A-0CABAF8F9D5C@oracle.com>
Message-ID: <6ef57374-a8db-6128-3de5-a04ab8a848e8@oracle.com>

Thanks for reviews, Vladimir and John.

Updated webrev (sorry, no individual patches this time):

   http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.01

Best regards,
Vladimir Ivanov

On 11.12.2019 01:17, John Rose wrote:
> Nice catch!
> 
>> On Dec 10, 2019, at 1:56 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> There is inconsistency in method names and comments string in some cases (I expects that number '2' is removed from names as in most cases):
> 

From vladimir.x.ivanov at oracle.com  Tue Dec 10 22:41:06 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 01:41:06 +0300
Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, 
 and MulV nodes
In-Reply-To: <273D2CF0-F8F5-4542-8770-22EF728529C4@oracle.com>
References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>
 <273D2CF0-F8F5-4542-8770-22EF728529C4@oracle.com>
Message-ID: <f14b7520-eb74-dfef-d663-c2e969c8f6d1@oracle.com>


> I concur with the occasional simplification of the printed formats.

It would be nice for ADLC to automatically generate AD instruction 
description when "format" declaration is omitted. Just printing 
instruction name and operand values should be as informative most of the 
time.

Best regards,
Vladimir Ivanov

> (The combined diff hunks are epic diff salad; such is life.
> Despite all that it reads OK in Emacs diff-auto-refine-mode.
> And the individual diffs are very helpful as a backup,
> especially for byte multiply.)
> 
> ? John
> 
>> On Dec 10, 2019, at 11:35 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/all
>> https://bugs.openjdk.java.net/browse/JDK-8235688
>>
>> Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following node types:
>>   - AddV(B/S/I/L/F/D)
>>   - SubVB/.../SubVD
>>   - MulVB/.../MulVD
>>
>> Individual patches:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/individual
> 
> Typo somewhere:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/indiviual/
>>
>> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
>>
>> The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier [2].
>>
>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
>>
>> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html
> 

From john.r.rose at oracle.com  Tue Dec 10 23:31:13 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 10 Dec 2019 15:31:13 -0800
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <AD2632F1-8C13-4D53-B274-C6E8B1A57D9D@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
 <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
 <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>
 <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com>
 <AD2632F1-8C13-4D53-B274-C6E8B1A57D9D@oracle.com>
Message-ID: <A282A257-7673-4B54-8FB9-ADB6DD56B351@oracle.com>

Actually I have one more comment about the new classification logic:

The ?ret_value? idiom is terrible.

I see a function which is complex, with something like this at the top:

  if (? simple size check ?) {
    ret_value = false;   // size not supported
  } ?

I want to read something more decisive like this:

  if (? simple size check ?) {
    return false;   // size not supported
  } ?

The ?ret_value? thingy adds only noise, and no clarity.

It is more than an annoyance, hence my comment here.
The problem is that if I want to understand the quick check
above, I have to scroll down *past all the other checks* to see
if some joker does ?ret_value = true? before the ?return ret_value?,
subverting my understanding of the code.  Basically, the ?ret_value?
nonsense makes the code impossible to break into understandable
parts.

? Grumpy John

On Dec 10, 2019, at 12:34 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) can be chosen as the baseline.
> 
> Yeah, I saw that coming after I visited the trusty intrinsics guide
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/ <https://software.intel.com/sites/landingpage/IntrinsicsGuide/>
>> 
>> Anyway, I got your idea and it makes perfect sense to me to collect such ideas.
> 


From rkennke at redhat.com  Tue Dec 10 23:57:48 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 11 Dec 2019 00:57:48 +0100
Subject: RFR: 8235729: Shenandoah: Remove useless casting to non-constant
Message-ID: <65746d9b-1b54-dd75-d340-a9f8e76929be@redhat.com>

We have a couple of code-paths in Shenandoah's C2 barrier code that cast 
values to non-constant. This appears to be a left-over from pre-LRB 
barrier scheme.

This patch removes those casts as well as the relevant methods in 
type.hpp/cpp. Shenandoah has been the only user of this.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8235729
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8235729/webrev.00/

Testing: hotspot_gc_shenandoah, ctw-tests with default and traversal mode

Can I please get a review of the change?

Thanks,
Roman


From john.r.rose at oracle.com  Wed Dec 11 00:00:50 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 10 Dec 2019 16:00:50 -0800
Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV,
 AbsV, and NegV nodes
In-Reply-To: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>
References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>
Message-ID: <F2F2177F-5201-4363-9D4F-E9487BAC85AF@oracle.com>

Thank you, reviewed.

For consistency, I?d expect to see the AD file mention vshiftq
instead of in or in addition to the very specific evpsraq.
Maybe actually call vshiftq (consistent with other parts of
AD) and comment that it calls evpsraq?  I had to look inside the
macro assembler to verify that evpsraq was properly aligned
with the other cases.  Or just leave a comment saying this is
what vshiftq would do also, like the other instructions.

For vabsnegF, I suggest adding a comment here:

+  predicate(n->as_Vector()->length() == 2 ||
+            // case 4 is handled as a 1-operand instruction by vabsneg4F
+            n->as_Vector()->length() == 8 ||
+            n->as_Vector()->length() == 16);

? Commentator John

On Dec 10, 2019, at 2:29 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all
> https://bugs.openjdk.java.net/browse/JDK-8235719
> 
> Merge AD instructions for the following vector nodes:
>  - LShiftV*, RShiftV*, URShiftV*
>  - AbsV*
>  - NegV*
> 
> Individual patches:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual
> 
> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
> 
> The patch is derived from the initial version of generic vector support [1]. Generic vector support was reviewed earlier [2].
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
> 
> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html


From gromero at linux.vnet.ibm.com  Wed Dec 11 00:04:31 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 10 Dec 2019 21:04:31 -0300
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
Message-ID: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>

Hi,

Could the following change be reviewed please?

Bug   : https://bugs.openjdk.java.net/browse/JDK-8234599
Webrev: http://cr.openjdk.java.net/~gromero/8234599/v1/

POWER9 does not have any cache management or barrier instructions aimed
specifically to deal with persistent memory (NVDIMM), hence in Power ISA v3.0
there are no instructions like data cache flush/store or sync instructions
specific to persistent memory. Nonetheless, in some cases (like through hardware
emulation) POWER9 can support NVDIMM and if Linux supports DAX (direct mapping of
persisnt memory) and a /dev/pmem device is available it's possible to use data
cache and sync instructions in the ISA (which are not explicitly aimed to
persistent memory) on a memory region backed by DAX, i.e. mapped using new
mmap() flag MAP_SYNC for persistent memory), so these instructions will have the
same semantics as the instructions for data cache flush / store and sync on
other architectures supporting NVDIMM and that have explicit instructions to deal
with persistent memory.

This change adds support for JEP-352 on POWER9 using 'dcbst' plus a sync
instruction to sync data to a memory backed by DAX (to persistent memory) when
that's required.

Moreover, that change also paves the way for supporting NVDIMM on future
Power CPUs that might have new data cache management and barrier instructions
explicitly to deal with persistent memory.

The change was developed and tested using a P9 QEMU VM (emulation) with NVDIMM
support. For details on how to setup a proper QEMU + Linux kernel w/ a /dev/pmem
device configured please see recipe in [2].

The JVM on a POWER9 machine with a /dev/pmem device properly set and with that
change applied is able to pass the test for JEP-352 [3]. The JVM is also able
to pass all tests of Mashona library [4].

When DAX is not supported, like on POWER8 and POWER9 w/o DAX support, OS won't
support mmap()'s MAP_SYNC flag, so kernel will return EOPNOTSUP when code will
try to allocate memory backed by a persistent memory device, so the JVM will get
a "java.io.IOException: Operation not supported". Naturally, on machines
that don't support writebacks or pmem supports_data_cache_line_flush() will
return false, but even in that case JVM will hit a EOPNOTSUPP and get a
"java.io.IOException: Operation not supported" sooner than it has the chance to
try to emit any writeback + sync instructions.

Thank you.

Best regards,
Gustavo

[1] http://man7.org/linux/man-pages/man2/mmap.2.html
[2] https://github.com/gromero/nvdimm
[3] http://hg.openjdk.java.net/jdk/jdk/file/336885e766af/test/jdk/java/nio/MappedByteBuffer/PmemTest.java
[4] https://github.com/jhalliday/mashona

From vladimir.kozlov at oracle.com  Wed Dec 11 00:14:17 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 10 Dec 2019 16:14:17 -0800
Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for
 large code caches
In-Reply-To: <c0521218-2367-f9f4-1098-dba6a5d0a9a4@oracle.com>
References: <c2439f20-597c-8a99-3c9f-fa08d6d9d2af@oracle.com>
 <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com>
 <c0521218-2367-f9f4-1098-dba6a5d0a9a4@oracle.com>
Message-ID: <be2042e5-b0bd-8618-a9b8-0ea9f35f14a2@oracle.com>

On 12/10/19 1:24 AM, Tom Rodriguez wrote:
> 
> 
> Vladimir Kozlov wrote on 12/9/19 8:09 PM:
>> But it looks like there are testing failures.
> 
> None of the testing failures look like they have anything to do with 
> these changes.? They all have failed recently in other builds from what 
> I can see.

You are right. I was concern about 2 kitchensink failures but they are 
present in current code too based on failure history.

> 
>>
>> The idea and changes seems fine to me. The only question I have is why 
>> JVMCI code want to deoptimize all related compiler frames immediately? 
>> Why marking is not enough?
> 
> I'm not sure what you're asking.? mark_for_deoptimization doesn't really 
> do anything on it's own.? You have to do the code cache scan to 
> make_not_entrant and then do the stack walks to invalidate any frames 
> which are still using that code.? I'm not changing the semantic of the 
> original function either, just skipping a useless full code cache scan.

Yes, right. I am confusing because not in all places we do this 
sequence: mark_for_deoptimization, make_not_entrant, patch frames.
Patching frames are not always at the same place as we do marking.

I also noticed that you changed under which locks make_not_entrant is 
done. May be better to pass nmethod into deoptimize_all_marked() and do 
it there (by default pass NULL)?

Thanks,
Vladimir

> 
> tom
> 
>>
>> Thanks,
>> Vladimir
>>
>> On 12/9/19 12:10 PM, Tom Rodriguez wrote:
>>> http://cr.openjdk.java.net/~never/8229377/webrev
>>> https://bugs.openjdk.java.net/browse/JDK-8229377
>>>
>>> This is a minor improvement to the JVMCI invalidate method to avoid 
>>> scanning large code caches when invalidating a single nmethod. 
>>> Instead the nmethod is directly made not_entrant.? In general I'm 
>>> unclear what the benefit of the 
>>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. 
>>> Testing is in progress.
>>>
>>> JDK-8230884 had been previously duplicated against this because they 
>>> overlapped a bit, but in the interest of clarity I separated them again.
>>>
>>> tom

From vladimir.kozlov at oracle.com  Wed Dec 11 00:18:15 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 10 Dec 2019 16:18:15 -0800
Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, 
 and MulV nodes
In-Reply-To: <6ef57374-a8db-6128-3de5-a04ab8a848e8@oracle.com>
References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com>
 <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com>
 <8B6F7BDE-DC8A-49AA-BB5A-0CABAF8F9D5C@oracle.com>
 <6ef57374-a8db-6128-3de5-a04ab8a848e8@oracle.com>
Message-ID: <3d787888-4031-8c0c-8d9f-fe9e9e588fd8@oracle.com>

Good.

Thanks,
Vladimir

On 12/10/19 2:35 PM, Vladimir Ivanov wrote:
> Thanks for reviews, Vladimir and John.
> 
> Updated webrev (sorry, no individual patches this time):
> 
>  ? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.01
> 
> Best regards,
> Vladimir Ivanov
> 
> On 11.12.2019 01:17, John Rose wrote:
>> Nice catch!
>>
>>> On Dec 10, 2019, at 1:56 PM, Vladimir Kozlov 
>>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>> There is inconsistency in method names and comments string in some 
>>> cases (I expects that number '2' is removed from names as in most 
>>> cases):
>>

From smita.kamath at intel.com  Wed Dec 11 01:41:12 2019
From: smita.kamath at intel.com (Kamath, Smita)
Date: Wed, 11 Dec 2019 01:41:12 +0000
Subject: RFR(M):8167065: Add intrinsic support for double precision shifting
 on x86_64
Message-ID: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>

Hi,


As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.

Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065

Link to webrev : http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/


I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.


[1] https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. 2C 5-471)

[2] https://software.intel.com/en-us/articles/intel-software-development-emulator


Regards,

Smita Kamath


From vladimir.kozlov at oracle.com  Wed Dec 11 02:48:40 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 10 Dec 2019 18:48:40 -0800
Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV,
 AbsV, and NegV nodes
In-Reply-To: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>
References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>
Message-ID: <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com>

In general I don't like using switches in this changes. In most examples 
you have only 2 instructions to choose from which could be done with 
'if/else'. 'default: ShouldNotReachHere()' is big code if inlined and 
never will be hit - you should hit first checks in supported vector size 
code.

I may prefer to see 2 AD instructions as you had in previous changes.

In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8.

Why you need predicate for vabsnegD ? Other length is not supported anyway.

Vladimir

On 12/10/19 2:29 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all
> https://bugs.openjdk.java.net/browse/JDK-8235719
> 
> Merge AD instructions for the following vector nodes:
>  ? - LShiftV*, RShiftV*, URShiftV*
>  ? - AbsV*
>  ? - NegV*
> 
> Individual patches:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual
> 
> As Jatin described, merging is applied only to AD instructions of 
> similar shape. There are some more opportunities for reduction/merging 
> left, but they are deliberately left out for future work.
> 
> The patch is derived from the initial version of generic vector support 
> [1]. Generic vector support was reviewed earlier [2].
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html 
> 
> 
> [2] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html 
> 

From david.holmes at oracle.com  Wed Dec 11 07:02:31 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 11 Dec 2019 17:02:31 +1000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
Message-ID: <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>

Hi Richard,

On 11/12/2019 7:45 am, Reingruber, Richard wrote:
> Hi,
> 
> I would like to get reviews please for
> 
> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
> 
> Corresponding RFE:
> https://bugs.openjdk.java.net/browse/JDK-8227745
> 
> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
> 
> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
> change is being tested at SAP since I posted the first RFR some months ago.
> 
> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
> agents request capabilities that allow them to access local variable values. E.g. if you start-up
> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
> from the beginning, well before a debugger attaches -- if ever one should do so. With the
> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
> you'll find more details.

Most of the details here are in areas I can comment on in detail, but I 
did take an initial general look at things.

The only thing that jumped out at me is that I think the 
DeoptimizeObjectsALotThread should be a hidden thread.

+  bool is_hidden_from_external_view() const { return true; }

Also I don't see any testing of the DeoptimizeObjectsALotThread. Without 
active testing this will just bit-rot.

Also on the tests I don't understand your @requires clause:

  @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & 
(vm.opt.TieredCompilation != true))

This seems to require that TieredCompilation is disabled, but tiered is 
our normal mode of operation. ??

Thanks,
David

> Thanks,
> Richard.
> 
> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>      http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
> 

From igor.veresov at oracle.com  Wed Dec 11 07:11:26 2019
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 10 Dec 2019 23:11:26 -0800
Subject: RFR(XL) 8235634: Update Graal
Message-ID: <4C30F569-6D7E-48C3-984F-32E03CDD5633@oracle.com>

Webrev: http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/>
JBS: https://bugs.openjdk.java.net/browse/JDK-8235634 <https://bugs.openjdk.java.net/browse/JDK-8235634>

The list of changes can be found in the JBS issue.

Thanks,
igor


From tom.rodriguez at oracle.com  Wed Dec 11 07:21:12 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 10 Dec 2019 23:21:12 -0800
Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for
 large code caches
In-Reply-To: <be2042e5-b0bd-8618-a9b8-0ea9f35f14a2@oracle.com>
References: <c2439f20-597c-8a99-3c9f-fa08d6d9d2af@oracle.com>
 <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com>
 <c0521218-2367-f9f4-1098-dba6a5d0a9a4@oracle.com>
 <be2042e5-b0bd-8618-a9b8-0ea9f35f14a2@oracle.com>
Message-ID: <16bf48b3-2be5-4631-8d39-2939aa29d6b9@oracle.com>

> Yes, right. I am confusing because not in all places we do this 
> sequence: mark_for_deoptimization, make_not_entrant, patch frames.
> Patching frames are not always at the same place as we do marking.
> 
> I also noticed that you changed under which locks make_not_entrant is 
> done. May be better to pass nmethod into deoptimize_all_marked() and do 
> it there (by default pass NULL)?

You mean the CodeCache_lock?  The CompiledMethod_lock is only thing 
required for make_not_entrant and that's acquired in 
make_not_entrant_or_zombie.  The CodeCache_lock is required for the safe 
iteration over the CodeCache itself.

Though I kind of like your suggestion of passing the nmethod to the call 
and performing the make_not_entrant call there instead.  It keeps the 
logic together.  I'll make that change.

tom

> 
> Thanks,
> Vladimir
> 
>>
>> tom
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 12/9/19 12:10 PM, Tom Rodriguez wrote:
>>>> http://cr.openjdk.java.net/~never/8229377/webrev
>>>> https://bugs.openjdk.java.net/browse/JDK-8229377
>>>>
>>>> This is a minor improvement to the JVMCI invalidate method to avoid 
>>>> scanning large code caches when invalidating a single nmethod. 
>>>> Instead the nmethod is directly made not_entrant.? In general I'm 
>>>> unclear what the benefit of the 
>>>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. 
>>>> Testing is in progress.
>>>>
>>>> JDK-8230884 had been previously duplicated against this because they 
>>>> overlapped a bit, but in the interest of clarity I separated them 
>>>> again.
>>>>
>>>> tom

From tobias.hartmann at oracle.com  Wed Dec 11 08:04:07 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 11 Dec 2019 09:04:07 +0100
Subject: RFR(S): Clean-up BarrierSetC2
In-Reply-To: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com>
References: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com>
Message-ID: <334bc847-a427-5d33-e094-53961f08ad23@oracle.com>

Hi Nils,

nice cleanup!

Why did you change "Node* var" -> "Node *var" in LoadNode::split_through_phi?

Thanks,
Tobias

On 10.12.19 15:00, Nils Eliasson wrote:
> Hi,
> 
> I can across a lot of dead code in BarrierSetC2 while debugging a completely separate problem. This
> is leftovers from legacy barrier implementations that no longer is needed.
> 
> This is a quick cleanup where non-used parts are removed. I tried to keep it as simple as possible
> to avoid introducing any bugs. There are additional cleanups that can be done in these files later.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8235653
> 
> Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01
> 
> 
> Please review,
> 
> Nils Eliasson
> 

From nils.eliasson at oracle.com  Wed Dec 11 09:16:12 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 11 Dec 2019 10:16:12 +0100
Subject: RFR(S): Clean-up BarrierSetC2
In-Reply-To: <334bc847-a427-5d33-e094-53961f08ad23@oracle.com>
References: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com>
 <334bc847-a427-5d33-e094-53961f08ad23@oracle.com>
Message-ID: <13d4c447-789d-8122-6d8a-23f5be200395@oracle.com>

Hi Tobias,

That's my IDE working against me.

I'll change that back before pushing.

Thanks!

Nils

On 2019-12-11 09:04, Tobias Hartmann wrote:
> Hi Nils,
>
> nice cleanup!
>
> Why did you change "Node* var" -> "Node *var" in LoadNode::split_through_phi?
>
> Thanks,
> Tobias
>
> On 10.12.19 15:00, Nils Eliasson wrote:
>> Hi,
>>
>> I can across a lot of dead code in BarrierSetC2 while debugging a completely separate problem. This
>> is leftovers from legacy barrier implementations that no longer is needed.
>>
>> This is a quick cleanup where non-used parts are removed. I tried to keep it as simple as possible
>> to avoid introducing any bugs. There are additional cleanups that can be done in these files later.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8235653
>>
>> Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01
>>
>>
>> Please review,
>>
>> Nils Eliasson
>>

From vladimir.x.ivanov at oracle.com  Wed Dec 11 09:37:43 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 12:37:43 +0300
Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with
 assert(is_OuterStripMinedLoop()) failed: invalid node class
In-Reply-To: <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com>
References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com>
 <8736dtbnm1.fsf@redhat.com> <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com>
Message-ID: <bd57d096-b029-4a74-80f8-1117280fa843@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/

Looks good.

Best regards,
Vladimir Ivanov

From tobias.hartmann at oracle.com  Wed Dec 11 09:38:46 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 11 Dec 2019 10:38:46 +0100
Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with
 assert(is_OuterStripMinedLoop()) failed: invalid node class
In-Reply-To: <bd57d096-b029-4a74-80f8-1117280fa843@oracle.com>
References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com>
 <8736dtbnm1.fsf@redhat.com> <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com>
 <bd57d096-b029-4a74-80f8-1117280fa843@oracle.com>
Message-ID: <3606cc43-5acb-b2ee-b39e-cf5f1c377e97@oracle.com>

Thanks Vladimir!

Best regards,
Tobias

On 11.12.19 10:37, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Wed Dec 11 09:51:25 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 12:51:25 +0300
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <e915e625-b8af-a14b-cbde-b0530b3adb9c@oracle.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
 <87blsga25b.fsf@redhat.com> <e915e625-b8af-a14b-cbde-b0530b3adb9c@oracle.com>
Message-ID: <ea738fa5-1e74-7676-abe1-853156f8691a@oracle.com>


> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/

Looks good.

Best regards,
Vladimir Ivanov

> On 10.12.19 11:46, Roland Westrelin wrote:
>>
>> Hi Christian,
>>
>>> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/
>>
>> Is this correct?
>>
>> 364???????? set_idom(stmt, iffast_pred, dom_depth(stmt));
>> 366???????? set_idom(cloned_stmt, ifslow_pred, dom_depth(stmt));
>>
>> stmt is a non CFG so it doesn't have an idom but a control?
>>
>> Roland.
>>

From rwestrel at redhat.com  Wed Dec 11 09:56:24 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 11 Dec 2019 10:56:24 +0100
Subject: RFR(S): 8235636: gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
 fails after JDK-8226411
Message-ID: <87zhfz89tj.fsf@redhat.com>


http://cr.openjdk.java.net/~roland/8235636/webrev.00/

ShenandoahBarrierC2Support::is_dominator_same_ctrl() must take
anti-dependencies into account otherwise when updating raw memory after
a barrier is expanded, a load may be reordered wrt a store.

Roland.


From christian.hagedorn at oracle.com  Wed Dec 11 10:02:42 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 11 Dec 2019 11:02:42 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <878snk9xl8.fsf@redhat.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
 <87blsga25b.fsf@redhat.com> <e915e625-b8af-a14b-cbde-b0530b3adb9c@oracle.com>
 <878snk9xl8.fsf@redhat.com>
Message-ID: <9ce1eba1-2fc1-67fb-c815-81cc5ee3e06c@oracle.com>

Thank you for your review Roland!

Best regards,
Christian

On 10.12.19 13:25, Roland Westrelin wrote:
> 
>> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/
> 
> Looks good to me.
> 
> Roland.
> 

From christian.hagedorn at oracle.com  Wed Dec 11 10:03:13 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 11 Dec 2019 11:03:13 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <ea738fa5-1e74-7676-abe1-853156f8691a@oracle.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
 <87blsga25b.fsf@redhat.com> <e915e625-b8af-a14b-cbde-b0530b3adb9c@oracle.com>
 <ea738fa5-1e74-7676-abe1-853156f8691a@oracle.com>
Message-ID: <30e6086b-e309-c402-21f9-36b2ab4fb8de@oracle.com>

Thank you for your review Vladimir!

Best regards,
Christian

On 11.12.19 10:51, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 10.12.19 11:46, Roland Westrelin wrote:
>>>
>>> Hi Christian,
>>>
>>>> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/
>>>
>>> Is this correct?
>>>
>>> 364???????? set_idom(stmt, iffast_pred, dom_depth(stmt));
>>> 366???????? set_idom(cloned_stmt, ifslow_pred, dom_depth(stmt));
>>>
>>> stmt is a non CFG so it doesn't have an idom but a control?
>>>
>>> Roland.
>>>

From vladimir.x.ivanov at oracle.com  Wed Dec 11 10:09:02 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 13:09:02 +0300
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <A282A257-7673-4B54-8FB9-ADB6DD56B351@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
 <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
 <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>
 <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com>
 <AD2632F1-8C13-4D53-B274-C6E8B1A57D9D@oracle.com>
 <A282A257-7673-4B54-8FB9-ADB6DD56B351@oracle.com>
Message-ID: <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com>

Yes, fully agree. Updated version:
   http://cr.openjdk.java.net/~vlivanov/8234392/webrev.01/

Got rid of ret_value and enhanced the comments, but also fixed 
long-standing bug you noticed: AbsVF should have the same additional 
checks as NegVF (since 512bit vandps is also introduced in AVX512DQ).

Best regards,
Vladimir Ivanov

On 11.12.2019 02:31, John Rose wrote:
> Actually I have one more comment about the new classification logic:
> 
> The ?ret_value? idiom is terrible.
> 
> I see a function which is complex, with something like this at the top:
> 
>  ? if (? simple size check ?) {
>  ? ? ret_value = false; ? // size not supported
>  ? } ?
> 
> I want to read something more decisive like this:
> 
>  ? if (? simple size check ?) {
>  ? ? return false; ? // size not supported
>  ? } ?
> 
> The ?ret_value? thingy adds only noise, and no clarity.
> 
> It is more than an annoyance, hence my comment here.
> The problem is that if I want to understand the quick check
> above, I have to scroll down *past all the other checks* to see
> if some joker does ?ret_value = true? before the ?return ret_value?,
> subverting my understanding of the code. ?Basically, the ?ret_value?
> nonsense makes the code impossible to break into understandable
> parts.
> 
> ? Grumpy John
> 
> On Dec 10, 2019, at 12:34 PM, John Rose <john.r.rose at oracle.com 
> <mailto:john.r.rose at oracle.com>> wrote:
>>
>>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL 
>>> extensions), but maybe there'll be a moment when Skylake 
>>> (F+CD+BW+DQ+VL) can be chosen as the baseline.
>>
>> Yeah, I saw that coming after I visited the trusty intrinsics guide
>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/
>>>
>>> Anyway, I got your idea and it makes perfect sense to me to collect 
>>> such ideas.
>>
> 

From vladimir.x.ivanov at oracle.com  Wed Dec 11 10:31:04 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 13:31:04 +0300
Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV,
 AbsV, and NegV nodes
In-Reply-To: <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com>
References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>
 <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com>
Message-ID: <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com>

Thanks for reviews, Vladimir and John.

Updated version:
   http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/

On 11.12.2019 05:48, Vladimir Kozlov wrote:
> In general I don't like using switches in this changes. In most examples 
> you have only 2 instructions to choose from which could be done with 
> 'if/else'. 'default: ShouldNotReachHere()' is big code if inlined and 
> never will be hit - you should hit first checks in supported vector size 
> code.

Didn't have strong opinion about them (and still don't), so I refactored 
most of the switches to branches.  Let me know how it looks now.

Regarding ShouldNotReachHere(): it would be unfortunate if we have to 
take size code increase into account when using it to mark never-taken.

Do you prefer "assert(false,...)" instead on for default case in switches?

> I may prefer to see 2 AD instructions as you had in previous changes.

Considering the main motivation is to reduce the number of instructions 
used, that would be a counter change. As I write later to John, I would 
like to see the dispatching be hidden inside MacroAssembler. It'll 
address your current concerns, right?

> In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8.

Good catch! Fixed.

> Why you need predicate for vabsnegD ? Other length is not supported anyway.

Agree, fixed.


On 11.12.2019 03:00, John Rose wrote:
 > Thank you, reviewed.
 >
 > For consistency, I?d expect to see the AD file mention vshiftq
 > instead of in or in addition to the very specific evpsraq.
 > Maybe actually call vshiftq (consistent with other parts of
 > AD) and comment that it calls evpsraq?  I had to look inside the
 > macro assembler to verify that evpsraq was properly aligned
 > with the other cases.  Or just leave a comment saying this is
 > what vshiftq would do also, like the other instructions.

In general, I'd like to see all the hardware-specific dispatching logic 
to be moved into MacroAssembler and AD file just to call into them.

But we (Jatin, Sandhya, and me) decided to limit the amount of 
refactorings and upstream what Jatin ended up with.

Are you fine with covering evpsraq case in a follow-up change?


 > For vabsnegF, I suggest adding a comment here:
 >
 > +  predicate(n->as_Vector()->length() == 2 ||
 > +            // case 4 is handled as a 1-operand instruction by vabsneg4F
 > +            n->as_Vector()->length() == 8 ||
 > +            n->as_Vector()->length() == 16);
 >

I took slightly different route and rewrote it as follows:

http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/individual/webrev.08.vabsneg_float/src/hotspot/cpu/x86/x86.ad.udiff.html

+instruct vabsnegF(vec dst, vec src, rRegI scratch) %{
+  predicate(n->as_Vector()->length() != 4); // handled by 1-operand 
instruction vabsneg4F

  instruct vabsneg4F(vec dst, rRegI scratch) %{
+  predicate(n->as_Vector()->length() == 4);

It looks clearer than the previous version.

Best regards,
Vladimir Ivanov


> 
> Vladimir
> 
> On 12/10/19 2:29 PM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all
>> https://bugs.openjdk.java.net/browse/JDK-8235719
>>
>> Merge AD instructions for the following vector nodes:
>> ?? - LShiftV*, RShiftV*, URShiftV*
>> ?? - AbsV*
>> ?? - NegV*
>>
>> Individual patches:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual 
>>
>>
>> As Jatin described, merging is applied only to AD instructions of 
>> similar shape. There are some more opportunities for reduction/merging 
>> left, but they are deliberately left out for future work.
>>
>> The patch is derived from the initial version of generic vector 
>> support [1]. Generic vector support was reviewed earlier [2].
>>
>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html 
>>
>>
>> [2] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html 
>>

From rkennke at redhat.com  Wed Dec 11 10:55:36 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 11 Dec 2019 11:55:36 +0100
Subject: RFR(S): 8235636:
 gc/shenandoah/compiler/TestUnsafeOffheapSwap.java fails after JDK-8226411
In-Reply-To: <87zhfz89tj.fsf@redhat.com>
References: <87zhfz89tj.fsf@redhat.com>
Message-ID: <73794273-64de-0ff5-e1a7-3cfb8202a69f@redhat.com>

Looks good! Thanks!

I presume that you have run appropriate testing.?

Roman


> http://cr.openjdk.java.net/~roland/8235636/webrev.00/
> 
> ShenandoahBarrierC2Support::is_dominator_same_ctrl() must take
> anti-dependencies into account otherwise when updating raw memory after
> a barrier is expanded, a load may be reordered wrt a store.
> 
> Roland.
> 


From vladimir.x.ivanov at oracle.com  Wed Dec 11 11:53:17 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 14:53:17 +0300
Subject: [14] RFR (L): 8235756: C2: Merge AD instructions for DivV, SqrtV,
 FmaV, AddReductionV, and MulReductionV nodes
Message-ID: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/all/
https://bugs.openjdk.java.net/browse/JDK-8235756

Merge AD instructions for the following vector nodes:
   - DivVF/DivVD
   - SqrtVF/SqrtVD
   - FmaVF/FmaVD
   - AddReductionV*
   - MulReductionV*

Individual patches:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/individual

Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Wed Dec 11 11:55:49 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Dec 2019 14:55:49 +0300
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
 <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
 <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>
 <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com>
 <AD2632F1-8C13-4D53-B274-C6E8B1A57D9D@oracle.com>
 <A282A257-7673-4B54-8FB9-ADB6DD56B351@oracle.com>
 <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com>
Message-ID: <f8e70293-2d03-8bec-24e8-4f3f9f6f559a@oracle.com>


> Yes, fully agree. Updated version:
>  ? http://cr.openjdk.java.net/~vlivanov/8234392/webrev.01/
> 
> Got rid of ret_value and enhanced the comments, but also fixed 
> long-standing bug you noticed: AbsVF should have the same additional 
> checks as NegVF (since 512bit vandps is also introduced in AVX512DQ).

Additionally, got rid of ret_value in Matcher::match_rule_supported() 
and moved Op_RoundDoubleModeV there since it doesn't depend on vector 
length:

+    case Op_RoundDoubleModeV:
+      if (VM_Version::supports_avx() == false) {
+        return false; // 128bit vroundpd is not available
+      } break;

Best regards,
Vladimir Ivanov

> On 11.12.2019 02:31, John Rose wrote:
>> Actually I have one more comment about the new classification logic:
>>
>> The ?ret_value? idiom is terrible.
>>
>> I see a function which is complex, with something like this at the top:
>>
>> ?? if (? simple size check ?) {
>> ?? ? ret_value = false; ? // size not supported
>> ?? } ?
>>
>> I want to read something more decisive like this:
>>
>> ?? if (? simple size check ?) {
>> ?? ? return false; ? // size not supported
>> ?? } ?
>>
>> The ?ret_value? thingy adds only noise, and no clarity.
>>
>> It is more than an annoyance, hence my comment here.
>> The problem is that if I want to understand the quick check
>> above, I have to scroll down *past all the other checks* to see
>> if some joker does ?ret_value = true? before the ?return ret_value?,
>> subverting my understanding of the code. ?Basically, the ?ret_value?
>> nonsense makes the code impossible to break into understandable
>> parts.
>>
>> ? Grumpy John
>>
>> On Dec 10, 2019, at 12:34 PM, John Rose <john.r.rose at oracle.com 
>> <mailto:john.r.rose at oracle.com>> wrote:
>>>
>>>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL 
>>>> extensions), but maybe there'll be a moment when Skylake 
>>>> (F+CD+BW+DQ+VL) can be chosen as the baseline.
>>>
>>> Yeah, I saw that coming after I visited the trusty intrinsics guide
>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/
>>>>
>>>> Anyway, I got your idea and it makes perfect sense to me to collect 
>>>> such ideas.
>>>
>>

From felix.yang at huawei.com  Wed Dec 11 12:39:46 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Wed, 11 Dec 2019 12:39:46 +0000
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>

Hi,

  Please review this patch fixing a crash in C2 superword transform phase.
  Bug: https://bugs.openjdk.java.net/browse/JDK-8235762

  This is similar to JDK-8229694, but a little bit more complex.
  The JVM crashes in the testcase when trying to dereference mem at [9] which is NULL.
This happens when converting packs into vector nodes in SuperWord::output() by reading an unexpected NULL value with SuperWord::align_to_ref() [8].
The corresponding field _align_to_ref is set to NULL at [7] since best_align_to_mem_ref was assigned NULL just before at [6].
_packset contained two packs and there were no memory operations left to be processed (memops is empty). As a result SuperWord::find_align_to_ref()
will return NULL since it cannot find an alignment from two different types of mem operations.

The loop in Test::vMeth is unrolled two times.  As a result, main loop contains the following 12 memory operations:

954    LoadI   ===  633  985  955  [[ 953 ]]  @int[int:>=0]:exact+any *, idx=6; #int !orig=320 !jvms: Test::vMeth @ bci:13
950    StoreI  ===  981  985  951  953  [[ 946  948 ]]  @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=342 !jvms: Test::vMeth @ bci:16
948    LoadI   ===  647  950  949  [[ 947 ]]  @int[int:>=0]:exact+any *, idx=6; #int !orig=369 !jvms: Test::vMeth @ bci:23
946    StoreI  ===  981  950  949  947  [[ 320  342 ]]  @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=391,989 !jvms: Test::vMeth @ bci:26
320    LoadI   ===  633  946  318  [[ 321 ]]  @int[int:>=0]:exact+any *, idx=6; #int !jvms: Test::vMeth @ bci:13
342    StoreI  ===  981  946  340  321  [[ 391  369 ]]  @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !jvms: Test::vMeth @ bci:16
369    LoadI   ===  647  342  367  [[ 370 ]]  @int[int:>=0]:exact+any *, idx=6; #int !jvms: Test::vMeth @ bci:23
391    StoreI  ===  981  342  367  370  [[ 772  985 ]]  @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=989 !jvms: Test::vMeth @ bci:26

958    StoreB  ===  981  986  959  10  [[ 470 ]]  @byte[int:>=0]:exact+any *, idx=10;  Memory: @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=470,[990] !jvms: Test::vMeth @ bci:44
470    StoreB  ===  981  958  468  10  [[ 774  986 ]]  @byte[int:>=0]:exact+any *, idx=10;  Memory: @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=[990] !jvms: Test::vMeth @ bci:44

961    StoreL  ===  981  984  962  12  [[ 431 ]]  @long[int:>=0]:exact+any *, idx=8;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=431,[991] !jvms: Test::vMeth @ bci:35
431    StoreL  ===  981  961  429  12  [[ 776  984 ]]  @long[int:>=0]:exact+any *, idx=8;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=[991] !jvms: Test::vMeth @ bci:35


Consider the while loop at [1]:
iter1:
  memops.size() = 12  mem_ref = 342  alignment: 342=0, 946=0, 391=4, 950=4  create_pack=true  _packset: <342, 950>  best_align_to_mem_ref = 342
iter2:
  memops.size() = 8  mem_ref = 431  alignment: 431=0, 961=8  create_pack=true  _packset: <342, 950>, <431, 961>  best_align_to_mem_ref = 342
iter3:
  memops.size() = 6  mem_ref = 470  alignment: 470=0,958=1  create_pack=true  _packset: <342, 950>, <431, 961>, <470, 958>  best_align_to_mem_ref = 342
iter4:
  memops.size() = 4  mem_ref = 320  alignment: 320=0, 954=4  create_pack=true  _packset: <342, 950>, <431, 961>, <470, 958>, <320, 954>  best_align_to_mem_ref = 342
iter5:
  memops.size() = 2  mem_ref = 369  alignment: 369=0, 948=4  create_pack=false  _packset: <431, 961>, <470, 958>  best_align_to_mem_ref = NULL  (<342, 950> and <320, 954> are removed from _packset at [4])

For iter5, create_pack is set to false at [2].  As a result, the two memory operations in memops are poped at [3].  And 431 and 470 are pushed to memops at [5].

Then memops for call find_align_to_ref at [6] only contains 431 and 470 which are different in type.  As a result, find_align_to_ref returns NULL.

[1]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l582
[2]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l637
[3]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l684
[4]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l693
[5]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l715
[6]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l717
[7]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l742
[8]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l2346
[9]
http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l3626

Proposed fix chooses one alignment from the remaining packs ignoring whether the memory operations are comparable or not.
This is currently under testing: https://bugs.openjdk.java.net/secure/attachment/86058/fix-superword.patch

diff -r 6cf6761c444e src/hotspot/share/opto/superword.cpp
--- a/src/hotspot/share/opto/superword.cpp Wed Dec 11 12:12:39 2019 +0100
+++ b/src/hotspot/share/opto/superword.cpp       Wed Dec 11 20:17:10 2019 +0800
@@ -576,12 +576,13 @@
   }
   Node_List align_to_refs;
+  int max_idx;
   int best_iv_adjustment = 0;
   MemNode* best_align_to_mem_ref = NULL;
   while (memops.size() != 0) {
     // Find a memory reference to align to.
-    MemNode* mem_ref = find_align_to_ref(memops);
+    MemNode* mem_ref = find_align_to_ref(memops, max_idx);
     if (mem_ref == NULL) break;
     align_to_refs.push(mem_ref);
     int iv_adjustment = get_iv_adjustment(mem_ref);
@@ -699,34 +700,36 @@
         // Put memory ops from remaining packs back on memops list for
         // the best alignment search.
         uint orig_msize = memops.size();
-        if (_packset.length() == 1 && orig_msize == 0) {
-          // If there are no remaining memory ops and only 1 pack we have only one choice
-          // for the alignment
-          Node_List* p = _packset.at(0);
-          assert(p->size() > 0, "sanity");
+        for (int i = 0; i < _packset.length(); i++) {
+          Node_List* p = _packset.at(i);
           MemNode* s = p->at(0)->as_Mem();
           assert(!same_velt_type(s, mem_ref), "sanity");
-          best_align_to_mem_ref = s;
-        } else {
-          for (int i = 0; i < _packset.length(); i++) {
-            Node_List* p = _packset.at(i);
-            MemNode* s = p->at(0)->as_Mem();
-            assert(!same_velt_type(s, mem_ref), "sanity");
-            memops.push(s);
+          memops.push(s);
+        }
+        best_align_to_mem_ref = find_align_to_ref(memops, max_idx);
+        if (best_align_to_mem_ref == NULL) {
+          if (TraceSuperWord) {
+            tty->print_cr("SuperWord::find_adjacent_refs(): best_align_to_mem_ref == NULL");
           }
-          best_align_to_mem_ref = find_align_to_ref(memops);
-          if (best_align_to_mem_ref == NULL) {
-            if (TraceSuperWord) {
-              tty->print_cr("SuperWord::find_adjacent_refs(): best_align_to_mem_ref == NULL");
+          if (_packset.length() > 0) {
+            if (orig_msize == 0) {
+              best_align_to_mem_ref = memops.at(max_idx)->as_Mem();
+            } else {
+              for (int i = 0; i < orig_msize; i++) {
+                memops.remove(0);
+              }
+              best_align_to_mem_ref = find_align_to_ref(memops, max_idx);
+              assert(best_align_to_mem_ref == NULL, "sanity");
+              best_align_to_mem_ref = memops.at(max_idx)->as_Mem();
             }
-            break;
           }
-          best_iv_adjustment = get_iv_adjustment(best_align_to_mem_ref);
-          NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, best_iv_adjustment);)
-          // Restore list.
-          while (memops.size() > orig_msize)
-            (void)memops.pop();
+          break;
         }
+        best_iv_adjustment = get_iv_adjustment(best_align_to_mem_ref);
+        NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, best_iv_adjustment);)
+        // Restore list.
+        while (memops.size() > orig_msize)
+          (void)memops.pop();
       }
     } // unaligned memory accesses
@@ -761,7 +764,7 @@
// Find a memory reference to align the loop induction variable to.
// Looks first at stores then at loads, looking for a memory reference
// with the largest number of references similar to it.
-MemNode* SuperWord::find_align_to_ref(Node_List &memops) {
+MemNode* SuperWord::find_align_to_ref(Node_List &memops, int &idx) {
   GrowableArray<int> cmp_ct(arena(), memops.size(), memops.size(), 0);
   // Count number of comparable memory ops
@@ -848,6 +851,7 @@
   }
#endif
+  idx = max_idx;
   if (max_ct > 0) {
#ifdef ASSERT
     if (TraceSuperWord) {
diff -r 6cf6761c444e src/hotspot/share/opto/superword.hpp
--- a/src/hotspot/share/opto/superword.hpp         Wed Dec 11 12:12:39 2019 +0100
+++ b/src/hotspot/share/opto/superword.hpp      Wed Dec 11 20:17:10 2019 +0800
@@ -408,7 +408,7 @@
   void print_loop(bool whole);
   #endif
   // Find a memory reference to align the loop induction variable to.
-  MemNode* find_align_to_ref(Node_List &memops);
+  MemNode* find_align_to_ref(Node_List &memops, int &idx);
   // Calculate loop's iv adjustment for this memory ops.
   int get_iv_adjustment(MemNode* mem);
   // Can the preloop align the reference to position zero in the vector?


Thanks,
Felix

From tobias.hartmann at oracle.com  Wed Dec 11 12:50:00 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 11 Dec 2019 13:50:00 +0100
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
Message-ID: <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>

Hi Felix,

is this a duplicate of JDK-8235700?

Please create a webrev and also include the regression test.

Thanks,
Tobias

On 11.12.19 13:39, Yangfei (Felix) wrote:
> Hi,
> 
>   Please review this patch fixing a crash in C2 superword transform phase.
>   Bug: https://bugs.openjdk.java.net/browse/JDK-8235762
> 
>   This is similar to JDK-8229694, but a little bit more complex.
>   The JVM crashes in the testcase when trying to dereference mem at [9] which is NULL.
> This happens when converting packs into vector nodes in SuperWord::output() by reading an unexpected NULL value with SuperWord::align_to_ref() [8].
> The corresponding field _align_to_ref is set to NULL at [7] since best_align_to_mem_ref was assigned NULL just before at [6].
> _packset contained two packs and there were no memory operations left to be processed (memops is empty). As a result SuperWord::find_align_to_ref()
> will return NULL since it cannot find an alignment from two different types of mem operations.
> 
> The loop in Test::vMeth is unrolled two times.  As a result, main loop contains the following 12 memory operations:
> 
> 954    LoadI   ===  633  985  955  [[ 953 ]]  @int[int:>=0]:exact+any *, idx=6; #int !orig=320 !jvms: Test::vMeth @ bci:13
> 950    StoreI  ===  981  985  951  953  [[ 946  948 ]]  @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=342 !jvms: Test::vMeth @ bci:16
> 948    LoadI   ===  647  950  949  [[ 947 ]]  @int[int:>=0]:exact+any *, idx=6; #int !orig=369 !jvms: Test::vMeth @ bci:23
> 946    StoreI  ===  981  950  949  947  [[ 320  342 ]]  @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=391,989 !jvms: Test::vMeth @ bci:26
> 320    LoadI   ===  633  946  318  [[ 321 ]]  @int[int:>=0]:exact+any *, idx=6; #int !jvms: Test::vMeth @ bci:13
> 342    StoreI  ===  981  946  340  321  [[ 391  369 ]]  @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !jvms: Test::vMeth @ bci:16
> 369    LoadI   ===  647  342  367  [[ 370 ]]  @int[int:>=0]:exact+any *, idx=6; #int !jvms: Test::vMeth @ bci:23
> 391    StoreI  ===  981  342  367  370  [[ 772  985 ]]  @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=989 !jvms: Test::vMeth @ bci:26
> 
> 958    StoreB  ===  981  986  959  10  [[ 470 ]]  @byte[int:>=0]:exact+any *, idx=10;  Memory: @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=470,[990] !jvms: Test::vMeth @ bci:44
> 470    StoreB  ===  981  958  468  10  [[ 774  986 ]]  @byte[int:>=0]:exact+any *, idx=10;  Memory: @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=[990] !jvms: Test::vMeth @ bci:44
> 
> 961    StoreL  ===  981  984  962  12  [[ 431 ]]  @long[int:>=0]:exact+any *, idx=8;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=431,[991] !jvms: Test::vMeth @ bci:35
> 431    StoreL  ===  981  961  429  12  [[ 776  984 ]]  @long[int:>=0]:exact+any *, idx=8;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=[991] !jvms: Test::vMeth @ bci:35
> 
> 
> Consider the while loop at [1]:
> iter1:
>   memops.size() = 12  mem_ref = 342  alignment: 342=0, 946=0, 391=4, 950=4  create_pack=true  _packset: <342, 950>  best_align_to_mem_ref = 342
> iter2:
>   memops.size() = 8  mem_ref = 431  alignment: 431=0, 961=8  create_pack=true  _packset: <342, 950>, <431, 961>  best_align_to_mem_ref = 342
> iter3:
>   memops.size() = 6  mem_ref = 470  alignment: 470=0,958=1  create_pack=true  _packset: <342, 950>, <431, 961>, <470, 958>  best_align_to_mem_ref = 342
> iter4:
>   memops.size() = 4  mem_ref = 320  alignment: 320=0, 954=4  create_pack=true  _packset: <342, 950>, <431, 961>, <470, 958>, <320, 954>  best_align_to_mem_ref = 342
> iter5:
>   memops.size() = 2  mem_ref = 369  alignment: 369=0, 948=4  create_pack=false  _packset: <431, 961>, <470, 958>  best_align_to_mem_ref = NULL  (<342, 950> and <320, 954> are removed from _packset at [4])
> 
> For iter5, create_pack is set to false at [2].  As a result, the two memory operations in memops are poped at [3].  And 431 and 470 are pushed to memops at [5].
> 
> Then memops for call find_align_to_ref at [6] only contains 431 and 470 which are different in type.  As a result, find_align_to_ref returns NULL.
> 
> [1]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l582
> [2]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l637
> [3]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l684
> [4]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l693
> [5]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l715
> [6]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l717
> [7]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l742
> [8]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l2346
> [9]
> http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l3626
> 
> Proposed fix chooses one alignment from the remaining packs ignoring whether the memory operations are comparable or not.
> This is currently under testing: https://bugs.openjdk.java.net/secure/attachment/86058/fix-superword.patch
> 
> diff -r 6cf6761c444e src/hotspot/share/opto/superword.cpp
> --- a/src/hotspot/share/opto/superword.cpp Wed Dec 11 12:12:39 2019 +0100
> +++ b/src/hotspot/share/opto/superword.cpp       Wed Dec 11 20:17:10 2019 +0800
> @@ -576,12 +576,13 @@
>    }
>    Node_List align_to_refs;
> +  int max_idx;
>    int best_iv_adjustment = 0;
>    MemNode* best_align_to_mem_ref = NULL;
>    while (memops.size() != 0) {
>      // Find a memory reference to align to.
> -    MemNode* mem_ref = find_align_to_ref(memops);
> +    MemNode* mem_ref = find_align_to_ref(memops, max_idx);
>      if (mem_ref == NULL) break;
>      align_to_refs.push(mem_ref);
>      int iv_adjustment = get_iv_adjustment(mem_ref);
> @@ -699,34 +700,36 @@
>          // Put memory ops from remaining packs back on memops list for
>          // the best alignment search.
>          uint orig_msize = memops.size();
> -        if (_packset.length() == 1 && orig_msize == 0) {
> -          // If there are no remaining memory ops and only 1 pack we have only one choice
> -          // for the alignment
> -          Node_List* p = _packset.at(0);
> -          assert(p->size() > 0, "sanity");
> +        for (int i = 0; i < _packset.length(); i++) {
> +          Node_List* p = _packset.at(i);
>            MemNode* s = p->at(0)->as_Mem();
>            assert(!same_velt_type(s, mem_ref), "sanity");
> -          best_align_to_mem_ref = s;
> -        } else {
> -          for (int i = 0; i < _packset.length(); i++) {
> -            Node_List* p = _packset.at(i);
> -            MemNode* s = p->at(0)->as_Mem();
> -            assert(!same_velt_type(s, mem_ref), "sanity");
> -            memops.push(s);
> +          memops.push(s);
> +        }
> +        best_align_to_mem_ref = find_align_to_ref(memops, max_idx);
> +        if (best_align_to_mem_ref == NULL) {
> +          if (TraceSuperWord) {
> +            tty->print_cr("SuperWord::find_adjacent_refs(): best_align_to_mem_ref == NULL");
>            }
> -          best_align_to_mem_ref = find_align_to_ref(memops);
> -          if (best_align_to_mem_ref == NULL) {
> -            if (TraceSuperWord) {
> -              tty->print_cr("SuperWord::find_adjacent_refs(): best_align_to_mem_ref == NULL");
> +          if (_packset.length() > 0) {
> +            if (orig_msize == 0) {
> +              best_align_to_mem_ref = memops.at(max_idx)->as_Mem();
> +            } else {
> +              for (int i = 0; i < orig_msize; i++) {
> +                memops.remove(0);
> +              }
> +              best_align_to_mem_ref = find_align_to_ref(memops, max_idx);
> +              assert(best_align_to_mem_ref == NULL, "sanity");
> +              best_align_to_mem_ref = memops.at(max_idx)->as_Mem();
>              }
> -            break;
>            }
> -          best_iv_adjustment = get_iv_adjustment(best_align_to_mem_ref);
> -          NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, best_iv_adjustment);)
> -          // Restore list.
> -          while (memops.size() > orig_msize)
> -            (void)memops.pop();
> +          break;
>          }
> +        best_iv_adjustment = get_iv_adjustment(best_align_to_mem_ref);
> +        NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, best_iv_adjustment);)
> +        // Restore list.
> +        while (memops.size() > orig_msize)
> +          (void)memops.pop();
>        }
>      } // unaligned memory accesses
> @@ -761,7 +764,7 @@
> // Find a memory reference to align the loop induction variable to.
> // Looks first at stores then at loads, looking for a memory reference
> // with the largest number of references similar to it.
> -MemNode* SuperWord::find_align_to_ref(Node_List &memops) {
> +MemNode* SuperWord::find_align_to_ref(Node_List &memops, int &idx) {
>    GrowableArray<int> cmp_ct(arena(), memops.size(), memops.size(), 0);
>    // Count number of comparable memory ops
> @@ -848,6 +851,7 @@
>    }
> #endif
> +  idx = max_idx;
>    if (max_ct > 0) {
> #ifdef ASSERT
>      if (TraceSuperWord) {
> diff -r 6cf6761c444e src/hotspot/share/opto/superword.hpp
> --- a/src/hotspot/share/opto/superword.hpp         Wed Dec 11 12:12:39 2019 +0100
> +++ b/src/hotspot/share/opto/superword.hpp      Wed Dec 11 20:17:10 2019 +0800
> @@ -408,7 +408,7 @@
>    void print_loop(bool whole);
>    #endif
>    // Find a memory reference to align the loop induction variable to.
> -  MemNode* find_align_to_ref(Node_List &memops);
> +  MemNode* find_align_to_ref(Node_List &memops, int &idx);
>    // Calculate loop's iv adjustment for this memory ops.
>    int get_iv_adjustment(MemNode* mem);
>    // Can the preloop align the reference to position zero in the vector?
> 
> 
> Thanks,
> Felix
> 

From felix.yang at huawei.com  Wed Dec 11 12:55:31 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Wed, 11 Dec 2019 12:55:31 +0000
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>

Hi Tobias,

  I didn't noticed JDK-8235700 when I started to fix this bug several days ago.  
  Sure, I will prepare a webrev.  Comments are welcome.  

Thanks,
Felix


> -----Original Message-----
> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com]
> Sent: Wednesday, December 11, 2019 8:50 PM
> To: Yangfei (Felix) <felix.yang at huawei.com>;
> hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
> 
> Hi Felix,
> 
> is this a duplicate of JDK-8235700?
> 
> Please create a webrev and also include the regression test.
> 
> Thanks,
> Tobias
> 
> On 11.12.19 13:39, Yangfei (Felix) wrote:
> > Hi,
> >
> >   Please review this patch fixing a crash in C2 superword transform phase.
> >   Bug: https://bugs.openjdk.java.net/browse/JDK-8235762
> >
> >   This is similar to JDK-8229694, but a little bit more complex.
> >   The JVM crashes in the testcase when trying to dereference mem at [9]
> which is NULL.
> > This happens when converting packs into vector nodes in SuperWord::output()
> by reading an unexpected NULL value with SuperWord::align_to_ref() [8].
> > The corresponding field _align_to_ref is set to NULL at [7] since
> best_align_to_mem_ref was assigned NULL just before at [6].
> > _packset contained two packs and there were no memory operations left
> > to be processed (memops is empty). As a result
> SuperWord::find_align_to_ref() will return NULL since it cannot find an
> alignment from two different types of mem operations.
> >
> > The loop in Test::vMeth is unrolled two times.  As a result, main loop
> contains the following 12 memory operations:
> >
> > 954    LoadI   ===  633  985  955  [[ 953 ]]  @int[int:>=0]:exact+any
> *, idx=6; #int !orig=320 !jvms: Test::vMeth @ bci:13
> > 950    StoreI  ===  981  985  951  953  [[ 946  948 ]]
> @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any
> *, idx=6; !orig=342 !jvms: Test::vMeth @ bci:16
> > 948    LoadI   ===  647  950  949  [[ 947 ]]  @int[int:>=0]:exact+any
> *, idx=6; #int !orig=369 !jvms: Test::vMeth @ bci:23
> > 946    StoreI  ===  981  950  949  947  [[ 320  342 ]]
> @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any
> *, idx=6; !orig=391,989 !jvms: Test::vMeth @ bci:26
> > 320    LoadI   ===  633  946  318  [[ 321 ]]  @int[int:>=0]:exact+any
> *, idx=6; #int !jvms: Test::vMeth @ bci:13
> > 342    StoreI  ===  981  946  340  321  [[ 391  369 ]]
> @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any
> *, idx=6; !jvms: Test::vMeth @ bci:16
> > 369    LoadI   ===  647  342  367  [[ 370 ]]  @int[int:>=0]:exact+any
> *, idx=6; #int !jvms: Test::vMeth @ bci:23
> > 391    StoreI  ===  981  342  367  370  [[ 772  985 ]]
> @int[int:>=0]:exact+any *, idx=6;  Memory: @int[int:>=0]:NotNull:exact+any
> *, idx=6; !orig=989 !jvms: Test::vMeth @ bci:26
> >
> > 958    StoreB  ===  981  986  959  10  [[ 470 ]]
> @byte[int:>=0]:exact+any *, idx=10;  Memory:
> @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=470,[990] !jvms:
> Test::vMeth @ bci:44
> > 470    StoreB  ===  981  958  468  10  [[ 774  986 ]]
> @byte[int:>=0]:exact+any *, idx=10;  Memory:
> @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=[990] !jvms: Test::vMeth @
> bci:44
> >
> > 961    StoreL  ===  981  984  962  12  [[ 431 ]]
> @long[int:>=0]:exact+any *, idx=8;  Memory:
> @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=431,[991] !jvms: Test::vMeth
> @ bci:35
> > 431    StoreL  ===  981  961  429  12  [[ 776  984 ]]
> @long[int:>=0]:exact+any *, idx=8;  Memory:
> @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=[991] !jvms: Test::vMeth @
> bci:35
> >
> >
> > Consider the while loop at [1]:
> > iter1:
> >   memops.size() = 12  mem_ref = 342  alignment: 342=0, 946=0, 391=4,
> > 950=4  create_pack=true  _packset: <342, 950>  best_align_to_mem_ref
> =
> > 342
> > iter2:
> >   memops.size() = 8  mem_ref = 431  alignment: 431=0, 961=8
> > create_pack=true  _packset: <342, 950>, <431, 961>
> > best_align_to_mem_ref = 342
> > iter3:
> >   memops.size() = 6  mem_ref = 470  alignment: 470=0,958=1
> > create_pack=true  _packset: <342, 950>, <431, 961>, <470, 958>
> > best_align_to_mem_ref = 342
> > iter4:
> >   memops.size() = 4  mem_ref = 320  alignment: 320=0, 954=4
> > create_pack=true  _packset: <342, 950>, <431, 961>, <470, 958>, <320,
> > 954>  best_align_to_mem_ref = 342
> > iter5:
> >   memops.size() = 2  mem_ref = 369  alignment: 369=0, 948=4
> > create_pack=false  _packset: <431, 961>, <470, 958>
> > best_align_to_mem_ref = NULL  (<342, 950> and <320, 954> are removed
> > from _packset at [4])
> >
> > For iter5, create_pack is set to false at [2].  As a result, the two memory
> operations in memops are poped at [3].  And 431 and 470 are pushed to
> memops at [5].
> >
> > Then memops for call find_align_to_ref at [6] only contains 431 and 470 which
> are different in type.  As a result, find_align_to_ref returns NULL.
> >
> > [1]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l582
> > [2]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l637
> > [3]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l684
> > [4]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l693
> > [5]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l715
> > [6]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l717
> > [7]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l742
> > [8]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l2346
> > [9]
> > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share
> > /opto/superword.cpp#l3626
> >
> > Proposed fix chooses one alignment from the remaining packs ignoring
> whether the memory operations are comparable or not.
> > This is currently under testing:
> > https://bugs.openjdk.java.net/secure/attachment/86058/fix-superword.pa
> > tch
> >
> > diff -r 6cf6761c444e src/hotspot/share/opto/superword.cpp
> > --- a/src/hotspot/share/opto/superword.cpp Wed Dec 11 12:12:39 2019
> > +0100
> > +++ b/src/hotspot/share/opto/superword.cpp       Wed Dec 11 20:17:10
> 2019 +0800
> > @@ -576,12 +576,13 @@
> >    }
> >    Node_List align_to_refs;
> > +  int max_idx;
> >    int best_iv_adjustment = 0;
> >    MemNode* best_align_to_mem_ref = NULL;
> >    while (memops.size() != 0) {
> >      // Find a memory reference to align to.
> > -    MemNode* mem_ref = find_align_to_ref(memops);
> > +    MemNode* mem_ref = find_align_to_ref(memops, max_idx);
> >      if (mem_ref == NULL) break;
> >      align_to_refs.push(mem_ref);
> >      int iv_adjustment = get_iv_adjustment(mem_ref); @@ -699,34
> > +700,36 @@
> >          // Put memory ops from remaining packs back on memops list for
> >          // the best alignment search.
> >          uint orig_msize = memops.size();
> > -        if (_packset.length() == 1 && orig_msize == 0) {
> > -          // If there are no remaining memory ops and only 1 pack we
> have only one choice
> > -          // for the alignment
> > -          Node_List* p = _packset.at(0);
> > -          assert(p->size() > 0, "sanity");
> > +        for (int i = 0; i < _packset.length(); i++) {
> > +          Node_List* p = _packset.at(i);
> >            MemNode* s = p->at(0)->as_Mem();
> >            assert(!same_velt_type(s, mem_ref), "sanity");
> > -          best_align_to_mem_ref = s;
> > -        } else {
> > -          for (int i = 0; i < _packset.length(); i++) {
> > -            Node_List* p = _packset.at(i);
> > -            MemNode* s = p->at(0)->as_Mem();
> > -            assert(!same_velt_type(s, mem_ref), "sanity");
> > -            memops.push(s);
> > +          memops.push(s);
> > +        }
> > +        best_align_to_mem_ref = find_align_to_ref(memops, max_idx);
> > +        if (best_align_to_mem_ref == NULL) {
> > +          if (TraceSuperWord) {
> > +            tty->print_cr("SuperWord::find_adjacent_refs():
> > + best_align_to_mem_ref == NULL");
> >            }
> > -          best_align_to_mem_ref = find_align_to_ref(memops);
> > -          if (best_align_to_mem_ref == NULL) {
> > -            if (TraceSuperWord) {
> > -              tty->print_cr("SuperWord::find_adjacent_refs():
> best_align_to_mem_ref == NULL");
> > +          if (_packset.length() > 0) {
> > +            if (orig_msize == 0) {
> > +              best_align_to_mem_ref =
> memops.at(max_idx)->as_Mem();
> > +            } else {
> > +              for (int i = 0; i < orig_msize; i++) {
> > +                memops.remove(0);
> > +              }
> > +              best_align_to_mem_ref = find_align_to_ref(memops,
> max_idx);
> > +              assert(best_align_to_mem_ref == NULL, "sanity");
> > +              best_align_to_mem_ref =
> memops.at(max_idx)->as_Mem();
> >              }
> > -            break;
> >            }
> > -          best_iv_adjustment =
> get_iv_adjustment(best_align_to_mem_ref);
> > -
> NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref,
> best_iv_adjustment);)
> > -          // Restore list.
> > -          while (memops.size() > orig_msize)
> > -            (void)memops.pop();
> > +          break;
> >          }
> > +        best_iv_adjustment =
> get_iv_adjustment(best_align_to_mem_ref);
> > +
> NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref,
> best_iv_adjustment);)
> > +        // Restore list.
> > +        while (memops.size() > orig_msize)
> > +          (void)memops.pop();
> >        }
> >      } // unaligned memory accesses
> > @@ -761,7 +764,7 @@
> > // Find a memory reference to align the loop induction variable to.
> > // Looks first at stores then at loads, looking for a memory reference
> > // with the largest number of references similar to it.
> > -MemNode* SuperWord::find_align_to_ref(Node_List &memops) {
> > +MemNode* SuperWord::find_align_to_ref(Node_List &memops, int &idx) {
> >    GrowableArray<int> cmp_ct(arena(), memops.size(), memops.size(), 0);
> >    // Count number of comparable memory ops @@ -848,6 +851,7 @@
> >    }
> > #endif
> > +  idx = max_idx;
> >    if (max_ct > 0) {
> > #ifdef ASSERT
> >      if (TraceSuperWord) {
> > diff -r 6cf6761c444e src/hotspot/share/opto/superword.hpp
> > --- a/src/hotspot/share/opto/superword.hpp         Wed Dec 11 12:12:39
> 2019 +0100
> > +++ b/src/hotspot/share/opto/superword.hpp      Wed Dec 11 20:17:10
> 2019 +0800
> > @@ -408,7 +408,7 @@
> >    void print_loop(bool whole);
> >    #endif
> >    // Find a memory reference to align the loop induction variable to.
> > -  MemNode* find_align_to_ref(Node_List &memops);
> > +  MemNode* find_align_to_ref(Node_List &memops, int &idx);
> >    // Calculate loop's iv adjustment for this memory ops.
> >    int get_iv_adjustment(MemNode* mem);
> >    // Can the preloop align the reference to position zero in the vector?
> >
> >
> > Thanks,
> > Felix
> >

From kirk at kodewerk.com  Wed Dec 11 13:14:17 2019
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Wed, 11 Dec 2019 14:14:17 +0100
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <CAE-f1xQE8nEfXcVcdcNf33npiWVC4Vo4t0OepVVS4XwKUdV3kg@mail.gmail.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAE-f1xQE8nEfXcVcdcNf33npiWVC4Vo4t0OepVVS4XwKUdV3kg@mail.gmail.com>
Message-ID: <22E085AB-D5FF-478C-9BDC-2BB0C021A656@kodewerk.com>

+1 to what Charlie has said. Additionally we?ve been exploring EA with an eye on improving it?s ability to catch and eliminate transients. So far the work is showing lots of promise.

Kind regards,
Kirk


> On Dec 9, 2019, at 10:31 PM, Charles Oliver Nutter <headius at headius.com> wrote:
> 
> This gets a big +1 for me, but honestly even better would be eliminating
> the hard inline level limit altogether in favor of other metrics (inlined
> code size, optimized node count, incremental inlining). Ignoring that for
> the moment...
> 
> Anecdotally, I can say that there are many code paths in JRuby that don't
> inline fully *solely* because of this limit. For example, there are objects
> in the numeric tower that out of compatibility necessity have constructor
> paths that get dangerously close to 9 levels deep:
> 
> calling code-> Fixnum factory -> Fixnum constructor -> Integer -> Numeric
> -> Object -> BasicObject -> j.l.Object
> 
> ...and that would just inline into the first level of Ruby code. The last
> five levels here are mostly just "super()" calls. Ideally we would want to
> have a few levels of Ruby code inlining together too.
> 
> If this path doesn't inline, we've got no chance for escape analysis and
> other optimizations to reduce the transient numeric objects.
> 
> A question relating to JRuby: how does the inline level interact with
> MethodHandle/LambdaForm @ForceInline? Does that get around it? ALWAYS?
> Clearly we see most of our invokedynamic call sites inline, but I have
> little understanding of the actual cost of the five to 10 to 20 levels of
> LFs between an indy site and the eventual target.
> 
> In any case... +1, inline all the things.
> 
> - Charlie
> 
> On Sun, Dec 8, 2019 at 3:42 PM Claes Redestad <claes.redestad at oracle.com>
> wrote:
> 
>> Hi,
>> 
>> increasing MaxInlineLevel can substantially improve performance in some
>> benchmarks[1], and has been reported to help applications implemented in
>> scala in particular.
>> 
>> There is always some risk of regressions when tweaking the default
>> inlining settings. I've done a number of experiments to ascertain that
>> the effect of increasing this on a wide array of benchmarks. With 15 all
>> benchmarks tested are show either neutral or positive results, with no
>> observed regression w.r.t. compilation speed or code cache usage.
>> 
>> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/
>> 
>> Thanks!
>> 
>> /Claes
>> 
>> [1] One http://renaissance.dev sub-benchmark improve by almost 3x with
>> an increase from 9 to 15.
>> 


From nils.eliasson at oracle.com  Wed Dec 11 13:46:31 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 11 Dec 2019 14:46:31 +0100
Subject: RFR: 8234328: VectorSet::clear can cause fragmentation
In-Reply-To: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com>
References: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com>
Message-ID: <68719475-5d01-333f-6ac9-00bd201be370@oracle.com>

Hi Claes,

Your change looks good. Reviewed.

Regards,

Nils

On 2019-11-19 11:18, Claes Redestad wrote:
> Hi,
>
> today, VectorSet::clear "reclaims" storage when the size is large.
>
> However, since the backing array is allocated in a resource arena, this
> is dubious since the currently retained memory is only actually freed
> and made reusable if it's currently the last chunk of memory allocated
> in the arena. This means a clear() is likely to just waste the allocated
> memory until we exit the current resource scope
>
> Instead, I propose a strategy where instead of "freeing" we keep track
> of the currently allocated size of the VectorSet separately from the in-
> use size. We can then defer the memset to reset/clear the memory to the
> next time we need to grow, thus avoiding unnecessary reallocations and
> memsets. This limits the memory waste.
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234328
> Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/
>
> Testing: tier1-3
>
> Either of reset() or clear() could now be removed, which seems like a
> straightforward follow-up RFE. With some convincing I could roll it into
> this patch.
>
> Thanks!
>
> /Claes

From zhuoren.wz at alibaba-inc.com  Tue Dec 10 15:43:23 2019
From: zhuoren.wz at alibaba-inc.com (=?UTF-8?B?V2FuZyBaaHVvKFpodW9yZW4p?=)
Date: Tue, 10 Dec 2019 23:43:23 +0800
Subject: =?UTF-8?B?UmU6IFthYXJjaDY0LXBvcnQtZGV2IF0gY3Jhc2ggZHVlIHRvIGxvbmcgb2Zmc2V0?=
In-Reply-To: <26ad8b0f-95a6-f126-7ea1-aa59291145d4@redhat.com>
References: <2d65a25f-8e38-4da4-a6ef-b4ff9ba847fe.zhuoren.wz@alibaba-inc.com>
 <DB7PR08MB3115E3198230B3F96C9EAB9A965B0@DB7PR08MB3115.eurprd08.prod.outlook.com>,
 <26ad8b0f-95a6-f126-7ea1-aa59291145d4@redhat.com>
Message-ID: <bea88491-ad45-4fbd-8cac-4e95c3afb566.zhuoren.wz@alibaba-inc.com>

Pengfei, thanks for your comments.
Here is updated patch. 
http://cr.openjdk.java.net/~wzhuo/BigOffsetAarch64/webrev.01/


Haley, please also have a look at this patch.

Regards,
Zhuoren


------------------------------------------------------------------
From:Andrew Haley <aph at redhat.com>
Sent At:2019 Dec. 10 (Tue.) 22:06
To:Pengfei Li (Arm Technology China) <Pengfei.Li at arm.com>
Cc:hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net <aarch64-port-dev at openjdk.java.net>
Subject:Re: [aarch64-port-dev ] crash due to long offset

On 12/10/19 8:28 AM, Pengfei Li (Arm Technology China) wrote:
> Hi Zhuoren,
> 
>> I also wrote a patch to solve this issue, please also review.
>> http://cr.openjdk.java.net/~wzhuo/BigOffsetAarch64/webrev.00/jdk13u.pat
>> ch
> Thanks for your patch. I (NOT a reviewer) eyeballed your fix and found a probable mistake.
> 
> In  "enc_class aarch64_enc_str(iRegL src, memory mem) %{ ... %}", you have "if (($mem$$index == -1) && ($mem$$disp > 0)& (($mem$$disp & 0x7) != 0) && ($mem$$disp > 255))".
> Should it be "&&" instead of "&" in the middle?
> 
> Another question: Is it possible to add the logic into loadStore() or another new function instead of duplicating it everywhere in aarch64.ad?
> 
> I've also CC'ed this to hotspot-compiler-dev because all hotspot compiler patches (including AArch64 specific) should go through it for review.

I'm looking at this.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From matthias.baesken at sap.com  Wed Dec 11 14:31:19 2019
From: matthias.baesken at sap.com (Baesken, Matthias)
Date: Wed, 11 Dec 2019 14:31:19 +0000
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
Message-ID: <AM6PR02MB5078831B6E5AB1A3FA9BB143935A0@AM6PR02MB5078.eurprd02.prod.outlook.com>

Hi Gustavo,   thanks for  posting this .
I put  your change into our  internal  build+test queue .

We currently do not have something like you described ( a P9 QEMU VM (emulation)   with NVDIMM  support  )  in our test landscape, but 
 It does not hurt to have the patch in our builds/tests anyway ....

Best regards, Matthias


> 
> Hi,
> 
> Could the following change be reviewed please?
> 
> Bug   : https://bugs.openjdk.java.net/browse/JDK-8234599
> Webrev: http://cr.openjdk.java.net/~gromero/8234599/v1/
> 
> POWER9 does not have any cache management or barrier instructions aimed
> specifically to deal with persistent memory (NVDIMM), hence in Power ISA
> v3.0
> there are no instructions like data cache flush/store or sync instructions
> specific to persistent memory. Nonetheless, in some cases (like through
> hardware
> emulation) POWER9 can support NVDIMM and if Linux supports DAX (direct
> mapping of
> persisnt memory) and a /dev/pmem device is available it's possible to use
> data
> cache and sync instructions in the ISA (which are not explicitly aimed to
> persistent memory) on a memory region backed by DAX, i.e. mapped using
> new
> mmap() flag MAP_SYNC for persistent memory), so these instructions will
> have the
> same semantics as the instructions for data cache flush / store and sync on
> other architectures supporting NVDIMM and that have explicit instructions to
> deal
> with persistent memory.
> 
> This change adds support for JEP-352 on POWER9 using 'dcbst' plus a sync
> instruction to sync data to a memory backed by DAX (to persistent memory)
> when
> that's required.
> 
> Moreover, that change also paves the way for supporting NVDIMM on
> future
> Power CPUs that might have new data cache management and barrier
> instructions
> explicitly to deal with persistent memory.
> 
> The change was developed and tested using a P9 QEMU VM (emulation)
> with NVDIMM
> support. For details on how to setup a proper QEMU + Linux kernel w/ a
> /dev/pmem
> device configured please see recipe in [2].
> 
> The JVM on a POWER9 machine with a /dev/pmem device properly set and
> with that
> change applied is able to pass the test for JEP-352 [3]. The JVM is also able
> to pass all tests of Mashona library [4].
> 
> When DAX is not supported, like on POWER8 and POWER9 w/o DAX support,
> OS won't
> support mmap()'s MAP_SYNC flag, so kernel will return EOPNOTSUP when
> code will
> try to allocate memory backed by a persistent memory device, so the JVM
> will get
> a "java.io.IOException: Operation not supported". Naturally, on machines
> that don't support writebacks or pmem supports_data_cache_line_flush()
> will
> return false, but even in that case JVM will hit a EOPNOTSUPP and get a
> "java.io.IOException: Operation not supported" sooner than it has the
> chance to
> try to emit any writeback + sync instructions.
> 
> Thank you.
> 
> Best regards,
> Gustavo
> 
> [1] http://man7.org/linux/man-pages/man2/mmap.2.html
> [2] https://github.com/gromero/nvdimm
> [3]
> http://hg.openjdk.java.net/jdk/jdk/file/336885e766af/test/jdk/java/nio/Ma
> ppedByteBuffer/PmemTest.java
> [4] https://github.com/jhalliday/mashona

From christian.hagedorn at oracle.com  Wed Dec 11 14:53:47 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 11 Dec 2019 15:53:47 +0100
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
Message-ID: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com>

Hi Felix

Thanks for working on that. Your fix also seems to work for JDK-8235700. 
I closed that one as a duplicate of yours.

>>> +              for (int i = 0; i < orig_msize; i++) {

Should be uint since orig_msize is a uint

>>> +              best_align_to_mem_ref = find_align_to_ref(memops,
>> max_idx);
>>> +              assert(best_align_to_mem_ref == NULL, "sanity");

You can merge these two lines together into 
assert(find_align_to_ref(memops, max_idx) == NULL, "sanity");
since the call belongs to the sanity check. Or just surround it by a 
#ifdef ASSERT.

>>> +  idx = max_idx;

Is max_idx always guaranteed to be valid and not -1 when accessing it later?

Best regards,
Christian

From richard.reingruber at sap.com  Wed Dec 11 15:07:29 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Wed, 11 Dec 2019 15:07:29 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
Message-ID: <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hi David,

  > Most of the details here are in areas I can comment on in detail, but I 
  > did take an initial general look at things.

Thanks for taking the time!

  > The only thing that jumped out at me is that I think the 
  > DeoptimizeObjectsALotThread should be a hidden thread.
  > 
  > +  bool is_hidden_from_external_view() const { return true; }

Yes, it should. Will add the method like above.

  > Also I don't see any testing of the DeoptimizeObjectsALotThread. Without 
  > active testing this will just bit-rot.

DeoptimizeObjectsALot is meant for stress testing with a larger workload. I will add a minimal test
to keep it fresh.

  > Also on the tests I don't understand your @requires clause:
  > 
  >   @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & 
  > (vm.opt.TieredCompilation != true))
  > 
  > This seems to require that TieredCompilation is disabled, but tiered is 
  > our normal mode of operation. ??
  > 

I removed the clause. I guess I wanted to target the tests towards the code they are supposed to
test, and it's easier to analyze failures w/o tiered compilation and with just one compiler thread.

Additionally I will make use of compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.

Thanks,
Richard.

-----Original Message-----
From: David Holmes <david.holmes at oracle.com> 
Sent: Mittwoch, 11. Dezember 2019 08:03
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard,

On 11/12/2019 7:45 am, Reingruber, Richard wrote:
> Hi,
> 
> I would like to get reviews please for
> 
> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
> 
> Corresponding RFE:
> https://bugs.openjdk.java.net/browse/JDK-8227745
> 
> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
> 
> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
> change is being tested at SAP since I posted the first RFR some months ago.
> 
> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
> agents request capabilities that allow them to access local variable values. E.g. if you start-up
> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
> from the beginning, well before a debugger attaches -- if ever one should do so. With the
> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
> you'll find more details.

Most of the details here are in areas I can comment on in detail, but I 
did take an initial general look at things.

The only thing that jumped out at me is that I think the 
DeoptimizeObjectsALotThread should be a hidden thread.

+  bool is_hidden_from_external_view() const { return true; }

Also I don't see any testing of the DeoptimizeObjectsALotThread. Without 
active testing this will just bit-rot.

Also on the tests I don't understand your @requires clause:

  @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & 
(vm.opt.TieredCompilation != true))

This seems to require that TieredCompilation is disabled, but tiered is 
our normal mode of operation. ??

Thanks,
David

> Thanks,
> Richard.
> 
> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>      http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
> 

From felix.yang at huawei.com  Wed Dec 11 15:11:42 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Wed, 11 Dec 2019 15:11:42 +0000
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
 <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8194@dggeml527-mbx.china.huawei.com>

Hi Christian,

Thanks for the suggestions. Comments inlined.  

> -----Original Message-----
> From: Christian Hagedorn [mailto:christian.hagedorn at oracle.com]
> Sent: Wednesday, December 11, 2019 10:54 PM
> To: Yangfei (Felix) <felix.yang at huawei.com>; Tobias Hartmann
> <tobias.hartmann at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
> 
> Hi Felix
> 
> Thanks for working on that. Your fix also seems to work for JDK-8235700.
> I closed that one as a duplicate of yours.
> 
> >>> +              for (int i = 0; i < orig_msize; i++) {
> 
> Should be uint since orig_msize is a uint

-- Yes, will modify accordingly when I am preparing webrev. 

> 
> >>> +              best_align_to_mem_ref = find_align_to_ref(memops,
> >> max_idx);
> >>> +              assert(best_align_to_mem_ref == NULL, "sanity");
> 
> You can merge these two lines together into assert(find_align_to_ref(memops,
> max_idx) == NULL, "sanity"); since the call belongs to the sanity check. Or just
> surround it by a #ifdef ASSERT.

-- The purpose of line 721 here is to calculate the max_idx. 
  So I don't think it's suitable to treat this line as assertion logic.

> >>> +  idx = max_idx;
> 
> Is max_idx always guaranteed to be valid and not -1 when accessing it later?

-- Yes, I think so. 
  When memops is not empty and the memory ops in memops are not comparable, find_align_to_ref will always sets its max_idx. 

Thanks,
Felix

From rwestrel at redhat.com  Wed Dec 11 15:36:12 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 11 Dec 2019 16:36:12 +0100
Subject: RFR: 8235729: Shenandoah: Remove useless casting to non-constant
In-Reply-To: <65746d9b-1b54-dd75-d340-a9f8e76929be@redhat.com>
References: <65746d9b-1b54-dd75-d340-a9f8e76929be@redhat.com>
Message-ID: <87r21a98nn.fsf@redhat.com>


> http://cr.openjdk.java.net/~rkennke/JDK-8235729/webrev.00/

Looks good to me.

Roland.


From martin.doerr at sap.com  Wed Dec 11 16:55:23 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 11 Dec 2019 16:55:23 +0000
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
Message-ID: <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi Gustavo,

thanks for implementing it. Unfortunately, we can't test it at the moment.

I have a few change requests:


macroAssembler_ppc.cpp
I don't like silently emitting nothing in case !VM_Version::supports_data_cache_line_flush().
If you want to check for it, I suggest to assert VM_Version::supports_data_cache_line_flush() and avoid generating the stub otherwise (stubGenerator_ppc).


ppc.ad
The predicates are redundant and should better get removed (useless evaluation).
cacheWBPreSync could use cost 0 for clearity. (The costs don't have any effect because there is no choice for the matcher.)


stubGenerator_ppc.cpp
I think checking cmpwi(... is_presync, 1) is ok because the ABI specifies that "bool true" is one Byte with the value 1 and the C calling convention enforces extension to 8 Byte.
I would have used andi_ + bne to be on the safe side, but I believe your version is ok.

Comment "// post sync => emit 'lwsync'" is wrong. We use 'sync'.


Best regards,
Martin


> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Mittwoch, 11. Dezember 2019 01:05
> Cc: Andrew Dinn <adinn at redhat.com>; Baesken, Matthias
> <matthias.baesken at sap.com>; Doerr, Martin <martin.doerr at sap.com>;
> hotspot-compiler-dev at openjdk.java.net
> Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
> JEP-352
> 
> Hi,
> 
> Could the following change be reviewed please?
> 
> Bug   : https://bugs.openjdk.java.net/browse/JDK-8234599
> Webrev: http://cr.openjdk.java.net/~gromero/8234599/v1/
> 
> POWER9 does not have any cache management or barrier instructions aimed
> specifically to deal with persistent memory (NVDIMM), hence in Power ISA
> v3.0
> there are no instructions like data cache flush/store or sync instructions
> specific to persistent memory. Nonetheless, in some cases (like through
> hardware
> emulation) POWER9 can support NVDIMM and if Linux supports DAX (direct
> mapping of
> persisnt memory) and a /dev/pmem device is available it's possible to use
> data
> cache and sync instructions in the ISA (which are not explicitly aimed to
> persistent memory) on a memory region backed by DAX, i.e. mapped using
> new
> mmap() flag MAP_SYNC for persistent memory), so these instructions will
> have the
> same semantics as the instructions for data cache flush / store and sync on
> other architectures supporting NVDIMM and that have explicit instructions to
> deal
> with persistent memory.
> 
> This change adds support for JEP-352 on POWER9 using 'dcbst' plus a sync
> instruction to sync data to a memory backed by DAX (to persistent memory)
> when
> that's required.
> 
> Moreover, that change also paves the way for supporting NVDIMM on
> future
> Power CPUs that might have new data cache management and barrier
> instructions
> explicitly to deal with persistent memory.
> 
> The change was developed and tested using a P9 QEMU VM (emulation)
> with NVDIMM
> support. For details on how to setup a proper QEMU + Linux kernel w/ a
> /dev/pmem
> device configured please see recipe in [2].
> 
> The JVM on a POWER9 machine with a /dev/pmem device properly set and
> with that
> change applied is able to pass the test for JEP-352 [3]. The JVM is also able
> to pass all tests of Mashona library [4].
> 
> When DAX is not supported, like on POWER8 and POWER9 w/o DAX support,
> OS won't
> support mmap()'s MAP_SYNC flag, so kernel will return EOPNOTSUP when
> code will
> try to allocate memory backed by a persistent memory device, so the JVM
> will get
> a "java.io.IOException: Operation not supported". Naturally, on machines
> that don't support writebacks or pmem supports_data_cache_line_flush()
> will
> return false, but even in that case JVM will hit a EOPNOTSUPP and get a
> "java.io.IOException: Operation not supported" sooner than it has the
> chance to
> try to emit any writeback + sync instructions.
> 
> Thank you.
> 
> Best regards,
> Gustavo
> 
> [1] http://man7.org/linux/man-pages/man2/mmap.2.html
> [2] https://github.com/gromero/nvdimm
> [3]
> http://hg.openjdk.java.net/jdk/jdk/file/336885e766af/test/jdk/java/nio/Ma
> ppedByteBuffer/PmemTest.java
> [4] https://github.com/jhalliday/mashona

From john.r.rose at oracle.com  Wed Dec 11 16:59:53 2019
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 11 Dec 2019 08:59:53 -0800
Subject: [14] RFR (L): 8235756: C2: Merge AD instructions for DivV, SqrtV, 
 FmaV, AddReductionV, and MulReductionV nodes
In-Reply-To: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com>
References: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com>
Message-ID: <80057428-11F9-41AB-AFE5-533873E05E1B@oracle.com>

Whew!  Reviewed.

I did not (a) verify that the reduction code was correct before the change,
nor did I (b) verify that there are no functional changes due to the change,
though I did spot-check.  Please, tell me the various reduction templates
are adequately covered by our tests.

For later, I wish we could handle the reduction cases more mechanically.
We now have hand-maintained reduction trees for each power of two
lane count.  Clearly this could be more mechanized, by writing the
reduction of 2N lanes to N lanes once by hand, and then applying it
lg N times for any particular N.  Obviously, don?t do that now.

? John

> On Dec 11, 2019, at 3:53 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/all/
> https://bugs.openjdk.java.net/browse/JDK-8235756
> 
> Merge AD instructions for the following vector nodes:
>  - DivVF/DivVD
>  - SqrtVF/SqrtVD
>  - FmaVF/FmaVD
>  - AddReductionV*
>  - MulReductionV*
> 
> Individual patches:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/individual
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov


From john.r.rose at oracle.com  Wed Dec 11 17:03:29 2019
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 11 Dec 2019 09:03:29 -0800
Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV,
 AbsV, and NegV nodes
In-Reply-To: <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com>
References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>
 <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com>
 <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com>
Message-ID: <441B6171-212F-4305-BE13-1C4E65D250DB@oracle.com>

Thanks for taking my comments into account.
I looked again at your webrev and like it even better.
Any of the current micro-versions on the table is OK with me.

? John

> On Dec 11, 2019, at 2:31 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Thanks for reviews, Vladimir and John.
> 
> Updated version:
>  http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/
> 
> On 11.12.2019 05:48, Vladimir Kozlov wrote:
>> In general I don't like using switches in this changes. In most examples you have only 2 instructions to choose from which could be done with 'if/else'. 'default: ShouldNotReachHere()' is big code if inlined and never will be hit - you should hit first checks in supported vector size code.
> 
> Didn't have strong opinion about them (and still don't), so I refactored most of the switches to branches.  Let me know how it looks now.
> 
> Regarding ShouldNotReachHere(): it would be unfortunate if we have to take size code increase into account when using it to mark never-taken.
> 
> Do you prefer "assert(false,...)" instead on for default case in switches?
> 
>> I may prefer to see 2 AD instructions as you had in previous changes.
> 
> Considering the main motivation is to reduce the number of instructions used, that would be a counter change. As I write later to John, I would like to see the dispatching be hidden inside MacroAssembler. It'll address your current concerns, right?
> 
>> In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8.
> 
> Good catch! Fixed.
> 
>> Why you need predicate for vabsnegD ? Other length is not supported anyway.
> 
> Agree, fixed.
> 
> 
> On 11.12.2019 03:00, John Rose wrote:
> > Thank you, reviewed.
> >
> > For consistency, I?d expect to see the AD file mention vshiftq
> > instead of in or in addition to the very specific evpsraq.
> > Maybe actually call vshiftq (consistent with other parts of
> > AD) and comment that it calls evpsraq?  I had to look inside the
> > macro assembler to verify that evpsraq was properly aligned
> > with the other cases.  Or just leave a comment saying this is
> > what vshiftq would do also, like the other instructions.
> 
> In general, I'd like to see all the hardware-specific dispatching logic to be moved into MacroAssembler and AD file just to call into them.
> 
> But we (Jatin, Sandhya, and me) decided to limit the amount of refactorings and upstream what Jatin ended up with.
> 
> Are you fine with covering evpsraq case in a follow-up change?
> 
> 
> > For vabsnegF, I suggest adding a comment here:
> >
> > +  predicate(n->as_Vector()->length() == 2 ||
> > +            // case 4 is handled as a 1-operand instruction by vabsneg4F
> > +            n->as_Vector()->length() == 8 ||
> > +            n->as_Vector()->length() == 16);
> >
> 
> I took slightly different route and rewrote it as follows:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/individual/webrev.08.vabsneg_float/src/hotspot/cpu/x86/x86.ad.udiff.html
> 
> +instruct vabsnegF(vec dst, vec src, rRegI scratch) %{
> +  predicate(n->as_Vector()->length() != 4); // handled by 1-operand instruction vabsneg4F
> 
> instruct vabsneg4F(vec dst, rRegI scratch) %{
> +  predicate(n->as_Vector()->length() == 4);
> 
> It looks clearer than the previous version.
> 
> Best regards,
> Vladimir Ivanov
> 
> 
>> Vladimir
>> On 12/10/19 2:29 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all
>>> https://bugs.openjdk.java.net/browse/JDK-8235719
>>> 
>>> Merge AD instructions for the following vector nodes:
>>>    - LShiftV*, RShiftV*, URShiftV*
>>>    - AbsV*
>>>    - NegV*
>>> 
>>> Individual patches:
>>> 
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual 
>>> 
>>> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
>>> 
>>> The patch is derived from the initial version of generic vector support [1]. Generic vector support was reviewed earlier [2].
>>> 
>>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
>>> 
>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>> Reviewed-by: vlivanov, sviswanathan, ?
>>> 
>>> Best regards,
>>> Vladimir Ivanov
>>> 
>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html 
>>> 
>>> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html 


From tom.rodriguez at oracle.com  Wed Dec 11 17:46:42 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 11 Dec 2019 09:46:42 -0800
Subject: RFR(S) 8229961: Assert failure in
 compiler/graalunit/HotspotTest.java
In-Reply-To: <d84012af-c79e-4ea0-cd29-f2bfb34d62bf@oracle.com>
References: <cc527153-59c0-c87b-ea9e-fd9d994e2d7c@oracle.com>
 <d84012af-c79e-4ea0-cd29-f2bfb34d62bf@oracle.com>
Message-ID: <05bd6b65-d9ba-d69f-e3c8-1e8d830581c1@oracle.com>

Thanks!

tom

Vladimir Kozlov wrote on 12/10/19 2:14 PM:
> Looks good. Thank you for fixing JVMCI tests!
> 
> Vladimir
> 
> On 12/10/19 12:24 PM, Tom Rodriguez wrote:
>> http://cr.openjdk.java.net/~never/8229961/webrev
>> https://bugs.openjdk.java.net/browse/JDK-8229961
>>
>> The JVMCI InstalledCode object maintains a link back to the CodeBlob 
>> it's associated with.? In several places JVMCI extracts the CodeBlob* 
>> or nmethod* from the InstalledCode and the operates on it.? In most 
>> other cases where an nmethod is being examined there are external 
>> factors keeping it alive but in this case there aren't.? The actions 
>> of the concurrent sweeper can transition or potentially free the 
>> nmethod while it's being used.? Getting all the way to freeing would 
>> take a very adversarial schedule but JVMCI should provide more safety 
>> for the these code paths.
>>
>> This fix adds an nmethodLocker to these usages and through careful use 
>> of locks attempts to ensure that the resulting locked nmethod is alive 
>> and locked when it's returned.? I had to modify some of the jtreg 
>> tests because they were badly abusing the InstalledCode API and 
>> violating some assumptions about the relationship between nmethods and 
>> their wrapper InstalledCode objects.
>>
>> Testing was clean apart from a test compilation problem and existing 
>> issue.? I'm resubmitting with the fixed jtreg test.

From vladimir.kozlov at oracle.com  Wed Dec 11 18:55:05 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 11 Dec 2019 10:55:05 -0800
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
Message-ID: <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>

Hi Kamath,

First, general question. What performance you see when VBMI2 
instructions are *not* used with your new code vs code generated by C2.
What improvement you see when VBMI2 is used. This is to understand if we 
need only VBMI2 version of intrinsic or not.

Second. Sandhya recently pushed 8235510 changes to rollback avx512 code 
for CRC32 due to performance issues. Does you change has any issues on 
some Intel's CPU too? Should it be excluded on such CPUs?

Third. I would suggest to wait after we fork JDK 14 with this changes. I 
think it may be too late for 14 because we would need test this 
including performance testing.

In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. 
For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when 
UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of 
(UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.

I don't think we need separate flag UseVBMI2 - it could be controlled by 
UseAVX flag. We don't have flag for VNNI or other avx512 instructions 
subset.

In vm_version_x86.cpp you need to add more %s in print statement for new 
output.

You need to add @requires vm.compiler2.enabled to new test's commands to 
run it only with C2.

You need to add intrinsics to Graal's test to ignore them:

http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416

Thanks,
Vladimir

On 12/10/19 5:41 PM, Kamath, Smita wrote:
> Hi,
> 
> 
> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.
> 
> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
> 
> Link to webrev : http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
> 
> 
> 
> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
> 
> 
> [1] https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. 2C 5-471)
> 
> [2] https://software.intel.com/en-us/articles/intel-software-development-emulator
> 
> 
> Regards,
> 
> Smita Kamath
> 

From vladimir.kozlov at oracle.com  Wed Dec 11 19:34:45 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 11 Dec 2019 11:34:45 -0800
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <f8e70293-2d03-8bec-24e8-4f3f9f6f559a@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
 <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
 <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>
 <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com>
 <AD2632F1-8C13-4D53-B274-C6E8B1A57D9D@oracle.com>
 <A282A257-7673-4B54-8FB9-ADB6DD56B351@oracle.com>
 <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com>
 <f8e70293-2d03-8bec-24e8-4f3f9f6f559a@oracle.com>
Message-ID: <6050ff35-2842-7a02-34f9-5dbe9a260d7c@oracle.com>

I know it is pain but our coding stile requires to use {} for 'if' 
statement. And it will help visually to separate 'return' from 'break' 
statements.
Otherwise looks good.

Thanks,
Vladimir

On 12/11/19 3:55 AM, Vladimir Ivanov wrote:
> 
>> Yes, fully agree. Updated version:
>> ?? http://cr.openjdk.java.net/~vlivanov/8234392/webrev.01/
>>
>> Got rid of ret_value and enhanced the comments, but also fixed 
>> long-standing bug you noticed: AbsVF should have the same additional 
>> checks as NegVF (since 512bit vandps is also introduced in AVX512DQ).
> 
> Additionally, got rid of ret_value in Matcher::match_rule_supported() 
> and moved Op_RoundDoubleModeV there since it doesn't depend on vector 
> length:
> 
> +??? case Op_RoundDoubleModeV:
> +????? if (VM_Version::supports_avx() == false) {
> +??????? return false; // 128bit vroundpd is not available
> +????? } break;
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 11.12.2019 02:31, John Rose wrote:
>>> Actually I have one more comment about the new classification logic:
>>>
>>> The ?ret_value? idiom is terrible.
>>>
>>> I see a function which is complex, with something like this at the top:
>>>
>>> ?? if (? simple size check ?) {
>>> ?? ? ret_value = false; ? // size not supported
>>> ?? } ?
>>>
>>> I want to read something more decisive like this:
>>>
>>> ?? if (? simple size check ?) {
>>> ?? ? return false; ? // size not supported
>>> ?? } ?
>>>
>>> The ?ret_value? thingy adds only noise, and no clarity.
>>>
>>> It is more than an annoyance, hence my comment here.
>>> The problem is that if I want to understand the quick check
>>> above, I have to scroll down *past all the other checks* to see
>>> if some joker does ?ret_value = true? before the ?return ret_value?,
>>> subverting my understanding of the code. ?Basically, the ?ret_value?
>>> nonsense makes the code impossible to break into understandable
>>> parts.
>>>
>>> ? Grumpy John
>>>
>>> On Dec 10, 2019, at 12:34 PM, John Rose <john.r.rose at oracle.com 
>>> <mailto:john.r.rose at oracle.com>> wrote:
>>>>
>>>>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL 
>>>>> extensions), but maybe there'll be a moment when Skylake 
>>>>> (F+CD+BW+DQ+VL) can be chosen as the baseline.
>>>>
>>>> Yeah, I saw that coming after I visited the trusty intrinsics guide
>>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/
>>>>>
>>>>> Anyway, I got your idea and it makes perfect sense to me to collect 
>>>>> such ideas.
>>>>
>>>

From tom.rodriguez at oracle.com  Wed Dec 11 20:17:06 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 11 Dec 2019 12:17:06 -0800
Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for
 large code caches
In-Reply-To: <16bf48b3-2be5-4631-8d39-2939aa29d6b9@oracle.com>
References: <c2439f20-597c-8a99-3c9f-fa08d6d9d2af@oracle.com>
 <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com>
 <c0521218-2367-f9f4-1098-dba6a5d0a9a4@oracle.com>
 <be2042e5-b0bd-8618-a9b8-0ea9f35f14a2@oracle.com>
 <16bf48b3-2be5-4631-8d39-2939aa29d6b9@oracle.com>
Message-ID: <3a5a5cde-32f9-bb0c-9d7d-fe6866a092ba@oracle.com>

http://cr.openjdk.java.net/~never/8229377.1/webrev/

I have moved all the logic over into deoptimize_all_marked and added 
some comments to the method.  I also rearranged the invalidation logic 
to make it cleaner.  I've submitted the new version for testing.

tom

Tom Rodriguez wrote on 12/10/19 11:21 PM:
>> Yes, right. I am confusing because not in all places we do this 
>> sequence: mark_for_deoptimization, make_not_entrant, patch frames.
>> Patching frames are not always at the same place as we do marking.
>>
>> I also noticed that you changed under which locks make_not_entrant is 
>> done. May be better to pass nmethod into deoptimize_all_marked() and 
>> do it there (by default pass NULL)?
> 
> You mean the CodeCache_lock?? The CompiledMethod_lock is only thing 
> required for make_not_entrant and that's acquired in 
> make_not_entrant_or_zombie.? The CodeCache_lock is required for the safe 
> iteration over the CodeCache itself.
> 
> Though I kind of like your suggestion of passing the nmethod to the call 
> and performing the make_not_entrant call there instead.? It keeps the 
> logic together.? I'll make that change.
> 
> tom
> 
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> tom
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 12/9/19 12:10 PM, Tom Rodriguez wrote:
>>>>> http://cr.openjdk.java.net/~never/8229377/webrev
>>>>> https://bugs.openjdk.java.net/browse/JDK-8229377
>>>>>
>>>>> This is a minor improvement to the JVMCI invalidate method to avoid 
>>>>> scanning large code caches when invalidating a single nmethod. 
>>>>> Instead the nmethod is directly made not_entrant.? In general I'm 
>>>>> unclear what the benefit of the 
>>>>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. 
>>>>> Testing is in progress.
>>>>>
>>>>> JDK-8230884 had been previously duplicated against this because 
>>>>> they overlapped a bit, but in the interest of clarity I separated 
>>>>> them again.
>>>>>
>>>>> tom

From vladimir.kozlov at oracle.com  Wed Dec 11 20:42:33 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 11 Dec 2019 12:42:33 -0800
Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV,
 AbsV, and NegV nodes
In-Reply-To: <441B6171-212F-4305-BE13-1C4E65D250DB@oracle.com>
References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>
 <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com>
 <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com>
 <441B6171-212F-4305-BE13-1C4E65D250DB@oracle.com>
Message-ID: <e95826a3-f2eb-312e-35b3-d83711937147@oracle.com>

Changes mostly fine.

My concern is only about vshiftL_arith_reg() for RShiftVL. It should use 
'if' instead of switch statement and supported length should be filtered 
by match_rule_supported_vector(). Predicate should only check UseAVX <= 2.

Thanks,
Vladimir

On 12/11/19 9:03 AM, John Rose wrote:
> Thanks for taking my comments into account.
> I looked again at your webrev and like it even better.
> Any of the current micro-versions on the table is OK with me.
> 
> ? John
> 
>> On Dec 11, 2019, at 2:31 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> Thanks for reviews, Vladimir and John.
>>
>> Updated version:
>>   http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/
>>
>> On 11.12.2019 05:48, Vladimir Kozlov wrote:
>>> In general I don't like using switches in this changes. In most examples you have only 2 instructions to choose from which could be done with 'if/else'. 'default: ShouldNotReachHere()' is big code if inlined and never will be hit - you should hit first checks in supported vector size code.
>>
>> Didn't have strong opinion about them (and still don't), so I refactored most of the switches to branches.  Let me know how it looks now.
>>
>> Regarding ShouldNotReachHere(): it would be unfortunate if we have to take size code increase into account when using it to mark never-taken.
>>
>> Do you prefer "assert(false,...)" instead on for default case in switches?
>>
>>> I may prefer to see 2 AD instructions as you had in previous changes.
>>
>> Considering the main motivation is to reduce the number of instructions used, that would be a counter change. As I write later to John, I would like to see the dispatching be hidden inside MacroAssembler. It'll address your current concerns, right?
>>
>>> In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8.
>>
>> Good catch! Fixed.
>>
>>> Why you need predicate for vabsnegD ? Other length is not supported anyway.
>>
>> Agree, fixed.
>>
>>
>> On 11.12.2019 03:00, John Rose wrote:
>>> Thank you, reviewed.
>>>
>>> For consistency, I?d expect to see the AD file mention vshiftq
>>> instead of in or in addition to the very specific evpsraq.
>>> Maybe actually call vshiftq (consistent with other parts of
>>> AD) and comment that it calls evpsraq?  I had to look inside the
>>> macro assembler to verify that evpsraq was properly aligned
>>> with the other cases.  Or just leave a comment saying this is
>>> what vshiftq would do also, like the other instructions.
>>
>> In general, I'd like to see all the hardware-specific dispatching logic to be moved into MacroAssembler and AD file just to call into them.
>>
>> But we (Jatin, Sandhya, and me) decided to limit the amount of refactorings and upstream what Jatin ended up with.
>>
>> Are you fine with covering evpsraq case in a follow-up change?
>>
>>
>>> For vabsnegF, I suggest adding a comment here:
>>>
>>> +  predicate(n->as_Vector()->length() == 2 ||
>>> +            // case 4 is handled as a 1-operand instruction by vabsneg4F
>>> +            n->as_Vector()->length() == 8 ||
>>> +            n->as_Vector()->length() == 16);
>>>
>>
>> I took slightly different route and rewrote it as follows:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/individual/webrev.08.vabsneg_float/src/hotspot/cpu/x86/x86.ad.udiff.html
>>
>> +instruct vabsnegF(vec dst, vec src, rRegI scratch) %{
>> +  predicate(n->as_Vector()->length() != 4); // handled by 1-operand instruction vabsneg4F
>>
>> instruct vabsneg4F(vec dst, rRegI scratch) %{
>> +  predicate(n->as_Vector()->length() == 4);
>>
>> It looks clearer than the previous version.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>
>>> Vladimir
>>> On 12/10/19 2:29 PM, Vladimir Ivanov wrote:
>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all
>>>> https://bugs.openjdk.java.net/browse/JDK-8235719
>>>>
>>>> Merge AD instructions for the following vector nodes:
>>>>     - LShiftV*, RShiftV*, URShiftV*
>>>>     - AbsV*
>>>>     - NegV*
>>>>
>>>> Individual patches:
>>>>
>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual
>>>>
>>>> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work.
>>>>
>>>> The patch is derived from the initial version of generic vector support [1]. Generic vector support was reviewed earlier [2].
>>>>
>>>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
>>>>
>>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>>> Reviewed-by: vlivanov, sviswanathan, ?
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
>>>>
>>>> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html
> 

From vladimir.kozlov at oracle.com  Wed Dec 11 20:46:33 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 11 Dec 2019 12:46:33 -0800
Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for
 large code caches
In-Reply-To: <3a5a5cde-32f9-bb0c-9d7d-fe6866a092ba@oracle.com>
References: <c2439f20-597c-8a99-3c9f-fa08d6d9d2af@oracle.com>
 <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com>
 <c0521218-2367-f9f4-1098-dba6a5d0a9a4@oracle.com>
 <be2042e5-b0bd-8618-a9b8-0ea9f35f14a2@oracle.com>
 <16bf48b3-2be5-4631-8d39-2939aa29d6b9@oracle.com>
 <3a5a5cde-32f9-bb0c-9d7d-fe6866a092ba@oracle.com>
Message-ID: <94a5c390-78ca-0d29-712e-30fb3e4d5524@oracle.com>

Nice.

Thanks,
Vladimir

On 12/11/19 12:17 PM, Tom Rodriguez wrote:
> http://cr.openjdk.java.net/~never/8229377.1/webrev/
> 
> I have moved all the logic over into deoptimize_all_marked and added 
> some comments to the method.? I also rearranged the invalidation logic 
> to make it cleaner.? I've submitted the new version for testing.
> 
> tom
> 
> Tom Rodriguez wrote on 12/10/19 11:21 PM:
>>> Yes, right. I am confusing because not in all places we do this 
>>> sequence: mark_for_deoptimization, make_not_entrant, patch frames.
>>> Patching frames are not always at the same place as we do marking.
>>>
>>> I also noticed that you changed under which locks make_not_entrant is 
>>> done. May be better to pass nmethod into deoptimize_all_marked() and 
>>> do it there (by default pass NULL)?
>>
>> You mean the CodeCache_lock?? The CompiledMethod_lock is only thing 
>> required for make_not_entrant and that's acquired in 
>> make_not_entrant_or_zombie.? The CodeCache_lock is required for the 
>> safe iteration over the CodeCache itself.
>>
>> Though I kind of like your suggestion of passing the nmethod to the 
>> call and performing the make_not_entrant call there instead.? It keeps 
>> the logic together.? I'll make that change.
>>
>> tom
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> tom
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 12/9/19 12:10 PM, Tom Rodriguez wrote:
>>>>>> http://cr.openjdk.java.net/~never/8229377/webrev
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8229377
>>>>>>
>>>>>> This is a minor improvement to the JVMCI invalidate method to 
>>>>>> avoid scanning large code caches when invalidating a single 
>>>>>> nmethod. Instead the nmethod is directly made not_entrant.? In 
>>>>>> general I'm unclear what the benefit of the 
>>>>>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. 
>>>>>> Testing is in progress.
>>>>>>
>>>>>> JDK-8230884 had been previously duplicated against this because 
>>>>>> they overlapped a bit, but in the interest of clarity I separated 
>>>>>> them again.
>>>>>>
>>>>>> tom

From david.holmes at oracle.com  Wed Dec 11 21:02:57 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 12 Dec 2019 07:02:57 +1000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
Message-ID: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>

On 12/12/2019 1:07 am, Reingruber, Richard wrote:
> Hi David,
> 
>    > Most of the details here are in areas I can comment on in detail, but I
>    > did take an initial general look at things.
> 
> Thanks for taking the time!

Apologies the above should read:

"Most of the details here are in areas I *can't* comment on in detail ..."

David

>    > The only thing that jumped out at me is that I think the
>    > DeoptimizeObjectsALotThread should be a hidden thread.
>    >
>    > +  bool is_hidden_from_external_view() const { return true; }
> 
> Yes, it should. Will add the method like above.
> 
>    > Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
>    > active testing this will just bit-rot.
> 
> DeoptimizeObjectsALot is meant for stress testing with a larger workload. I will add a minimal test
> to keep it fresh.
> 
>    > Also on the tests I don't understand your @requires clause:
>    >
>    >   @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>    > (vm.opt.TieredCompilation != true))
>    >
>    > This seems to require that TieredCompilation is disabled, but tiered is
>    > our normal mode of operation. ??
>    >
> 
> I removed the clause. I guess I wanted to target the tests towards the code they are supposed to
> test, and it's easier to analyze failures w/o tiered compilation and with just one compiler thread.
> 
> Additionally I will make use of compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
> 
> Thanks,
> Richard.
> 
> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Mittwoch, 11. Dezember 2019 08:03
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
> 
> Hi Richard,
> 
> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>> Hi,
>>
>> I would like to get reviews please for
>>
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>
>> Corresponding RFE:
>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>
>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>
>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
>> change is being tested at SAP since I posted the first RFR some months ago.
>>
>> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
>> agents request capabilities that allow them to access local variable values. E.g. if you start-up
>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
>> from the beginning, well before a debugger attaches -- if ever one should do so. With the
>> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
>> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
>> you'll find more details.
> 
> Most of the details here are in areas I can comment on in detail, but I
> did take an initial general look at things.
> 
> The only thing that jumped out at me is that I think the
> DeoptimizeObjectsALotThread should be a hidden thread.
> 
> +  bool is_hidden_from_external_view() const { return true; }
> 
> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
> active testing this will just bit-rot.
> 
> Also on the tests I don't understand your @requires clause:
> 
>    @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
> (vm.opt.TieredCompilation != true))
> 
> This seems to require that TieredCompilation is disabled, but tiered is
> our normal mode of operation. ??
> 
> Thanks,
> David
> 
>> Thanks,
>> Richard.
>>
>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>       http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
>>

From vladimir.x.ivanov at oracle.com  Wed Dec 11 21:39:47 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 12 Dec 2019 00:39:47 +0300
Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV,
 AbsV, and NegV nodes
In-Reply-To: <e95826a3-f2eb-312e-35b3-d83711937147@oracle.com>
References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com>
 <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com>
 <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com>
 <441B6171-212F-4305-BE13-1C4E65D250DB@oracle.com>
 <e95826a3-f2eb-312e-35b3-d83711937147@oracle.com>
Message-ID: <f7839b25-9a45-0efd-b56d-dd6d5a5ae217@oracle.com>


> My concern is only about vshiftL_arith_reg() for RShiftVL. It should use 
> 'if' instead of switch statement and supported length should be filtered 
> by match_rule_supported_vector(). Predicate should only check UseAVX <= 2.

Good point, Vladimir. Fully agree. Will make the adjustment before the push.

Best regards,
Vladimir Ivanov

> On 12/11/19 9:03 AM, John Rose wrote:
>> Thanks for taking my comments into account.
>> I looked again at your webrev and like it even better.
>> Any of the current micro-versions on the table is OK with me.
>>
>> ? John
>>
>>> On Dec 11, 2019, at 2:31 AM, Vladimir Ivanov 
>>> <vladimir.x.ivanov at oracle.com> wrote:
>>>
>>> Thanks for reviews, Vladimir and John.
>>>
>>> Updated version:
>>> ? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/
>>>
>>> On 11.12.2019 05:48, Vladimir Kozlov wrote:
>>>> In general I don't like using switches in this changes. In most 
>>>> examples you have only 2 instructions to choose from which could be 
>>>> done with 'if/else'. 'default: ShouldNotReachHere()' is big code if 
>>>> inlined and never will be hit - you should hit first checks in 
>>>> supported vector size code.
>>>
>>> Didn't have strong opinion about them (and still don't), so I 
>>> refactored most of the switches to branches.? Let me know how it 
>>> looks now.
>>>
>>> Regarding ShouldNotReachHere(): it would be unfortunate if we have to 
>>> take size code increase into account when using it to mark never-taken.
>>>
>>> Do you prefer "assert(false,...)" instead on for default case in 
>>> switches?
>>>
>>>> I may prefer to see 2 AD instructions as you had in previous changes.
>>>
>>> Considering the main motivation is to reduce the number of 
>>> instructions used, that would be a counter change. As I write later 
>>> to John, I would like to see the dispatching be hidden inside 
>>> MacroAssembler. It'll address your current concerns, right?
>>>
>>>> In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8.
>>>
>>> Good catch! Fixed.
>>>
>>>> Why you need predicate for vabsnegD ? Other length is not supported 
>>>> anyway.
>>>
>>> Agree, fixed.
>>>
>>>
>>> On 11.12.2019 03:00, John Rose wrote:
>>>> Thank you, reviewed.
>>>>
>>>> For consistency, I?d expect to see the AD file mention vshiftq
>>>> instead of in or in addition to the very specific evpsraq.
>>>> Maybe actually call vshiftq (consistent with other parts of
>>>> AD) and comment that it calls evpsraq?? I had to look inside the
>>>> macro assembler to verify that evpsraq was properly aligned
>>>> with the other cases.? Or just leave a comment saying this is
>>>> what vshiftq would do also, like the other instructions.
>>>
>>> In general, I'd like to see all the hardware-specific dispatching 
>>> logic to be moved into MacroAssembler and AD file just to call into 
>>> them.
>>>
>>> But we (Jatin, Sandhya, and me) decided to limit the amount of 
>>> refactorings and upstream what Jatin ended up with.
>>>
>>> Are you fine with covering evpsraq case in a follow-up change?
>>>
>>>
>>>> For vabsnegF, I suggest adding a comment here:
>>>>
>>>> +? predicate(n->as_Vector()->length() == 2 ||
>>>> +??????????? // case 4 is handled as a 1-operand instruction by 
>>>> vabsneg4F
>>>> +??????????? n->as_Vector()->length() == 8 ||
>>>> +??????????? n->as_Vector()->length() == 16);
>>>>
>>>
>>> I took slightly different route and rewrote it as follows:
>>>
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/individual/webrev.08.vabsneg_float/src/hotspot/cpu/x86/x86.ad.udiff.html 
>>>
>>>
>>> +instruct vabsnegF(vec dst, vec src, rRegI scratch) %{
>>> +? predicate(n->as_Vector()->length() != 4); // handled by 1-operand 
>>> instruction vabsneg4F
>>>
>>> instruct vabsneg4F(vec dst, rRegI scratch) %{
>>> +? predicate(n->as_Vector()->length() == 4);
>>>
>>> It looks clearer than the previous version.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>
>>>> Vladimir
>>>> On 12/10/19 2:29 PM, Vladimir Ivanov wrote:
>>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all
>>>>> https://bugs.openjdk.java.net/browse/JDK-8235719
>>>>>
>>>>> Merge AD instructions for the following vector nodes:
>>>>> ??? - LShiftV*, RShiftV*, URShiftV*
>>>>> ??? - AbsV*
>>>>> ??? - NegV*
>>>>>
>>>>> Individual patches:
>>>>>
>>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual 
>>>>>
>>>>>
>>>>> As Jatin described, merging is applied only to AD instructions of 
>>>>> similar shape. There are some more opportunities for 
>>>>> reduction/merging left, but they are deliberately left out for 
>>>>> future work.
>>>>>
>>>>> The patch is derived from the initial version of generic vector 
>>>>> support [1]. Generic vector support was reviewed earlier [2].
>>>>>
>>>>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
>>>>>
>>>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>>>> Reviewed-by: vlivanov, sviswanathan, ?
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> [1] 
>>>>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html 
>>>>>
>>>>>
>>>>> [2] 
>>>>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html 
>>>>>
>>

From john.r.rose at oracle.com  Wed Dec 11 22:33:16 2019
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 11 Dec 2019 14:33:16 -0800
Subject: [14] RFR (XS): 8234392: C2: Extend
 Matcher::match_rule_supported_vector() with element type information
In-Reply-To: <f8e70293-2d03-8bec-24e8-4f3f9f6f559a@oracle.com>
References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com>
 <F6F1F277-A531-45A3-B303-F87C07A6BFE4@oracle.com>
 <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com>
 <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com>
 <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com>
 <AD2632F1-8C13-4D53-B274-C6E8B1A57D9D@oracle.com>
 <A282A257-7673-4B54-8FB9-ADB6DD56B351@oracle.com>
 <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com>
 <f8e70293-2d03-8bec-24e8-4f3f9f6f559a@oracle.com>
Message-ID: <A7D44324-1B47-473E-8DD4-4BBA93019BAD@oracle.com>

Two non-grumpy thumbs up!

(Braces or not, either way, that?s up to Vladimir K.)

> On Dec 11, 2019, at 3:55 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> 
>> Yes, fully agree. Updated version:
>>   http://cr.openjdk.java.net/~vlivanov/8234392/webrev.01/
>> Got rid of ret_value and enhanced the comments, but also fixed long-standing bug you noticed: AbsVF should have the same additional checks as NegVF (since 512bit vandps is also introduced in AVX512DQ).
> 
> Additionally, got rid of ret_value in Matcher::match_rule_supported() and moved Op_RoundDoubleModeV there since it doesn't depend on vector length:
> 
> +    case Op_RoundDoubleModeV:
> +      if (VM_Version::supports_avx() == false) {
> +        return false; // 128bit vroundpd is not available
> +      } break;
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 11.12.2019 02:31, John Rose wrote:
>>> Actually I have one more comment about the new classification logic:
>>> 
>>> The ?ret_value? idiom is terrible.
>>> 
>>> I see a function which is complex, with something like this at the top:
>>> 
>>>    if (? simple size check ?) {
>>>      ret_value = false;   // size not supported
>>>    } ?
>>> 
>>> I want to read something more decisive like this:
>>> 
>>>    if (? simple size check ?) {
>>>      return false;   // size not supported
>>>    } ?
>>> 
>>> The ?ret_value? thingy adds only noise, and no clarity.
>>> 
>>> It is more than an annoyance, hence my comment here.
>>> The problem is that if I want to understand the quick check
>>> above, I have to scroll down *past all the other checks* to see
>>> if some joker does ?ret_value = true? before the ?return ret_value?,
>>> subverting my understanding of the code.  Basically, the ?ret_value?
>>> nonsense makes the code impossible to break into understandable
>>> parts.
>>> 
>>> ? Grumpy John
>>> 
>>> On Dec 10, 2019, at 12:34 PM, John Rose <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>> wrote:
>>>> 
>>>>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) can be chosen as the baseline.
>>>> 
>>>> Yeah, I saw that coming after I visited the trusty intrinsics guide
>>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/
>>>>> 
>>>>> Anyway, I got your idea and it makes perfect sense to me to collect such ideas.
>>>> 
>>> 


From vladimir.kozlov at oracle.com  Wed Dec 11 23:00:08 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 11 Dec 2019 15:00:08 -0800
Subject: [14] RFR (L): 8235756: C2: Merge AD instructions for DivV, SqrtV, 
 FmaV, AddReductionV, and MulReductionV nodes
In-Reply-To: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com>
References: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com>
Message-ID: <6ab0d5cd-3720-a08d-2b97-a520ffc0ce34@oracle.com>

webrev.03.vsqrt_double changes include code from webrev.02.vsqrt_float 
and webrev.05.vfma_double - from webrev.04.vfma_float.

Otherwise Div, Sqrt, Fma code changes are fine. Even with above mess-up.

May be you need to separate Reduction changes into a separate patch so 
we can have more time to review it. It is more complex.

Thanks,
Vladimir


On 12/11/19 3:53 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/all/
> https://bugs.openjdk.java.net/browse/JDK-8235756
> 
> Merge AD instructions for the following vector nodes:
>  ? - DivVF/DivVD
>  ? - SqrtVF/SqrtVD
>  ? - FmaVF/FmaVD
>  ? - AddReductionV*
>  ? - MulReductionV*
> 
> Individual patches:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/individual
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Wed Dec 11 23:30:31 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 11 Dec 2019 15:30:31 -0800
Subject: RFR(XL) 8235634: Update Graal
In-Reply-To: <4C30F569-6D7E-48C3-984F-32E03CDD5633@oracle.com>
References: <4C30F569-6D7E-48C3-984F-32E03CDD5633@oracle.com>
Message-ID: <eb07ca85-9003-29c1-63a8-cd2a408e91a4@oracle.com>

Looks good.  Testing results are good too.

Thanks,
Vladimir

On 12/10/19 11:11 PM, Igor Veresov wrote:
> Webrev: http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235634 <https://bugs.openjdk.java.net/browse/JDK-8235634>
> 
> The list of changes can be found in the JBS issue.
> 
> Thanks,
> igor
> 
> 
> 

From ekaterina.pavlova at oracle.com  Thu Dec 12 00:41:39 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 11 Dec 2019 16:41:39 -0800
Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to run
 with ZGC
Message-ID: <7f7ece2c-de80-99bc-68ef-e4db180fce47@oracle.com>


Please review the urgent fix for regression introduced by JDK-8215728.

graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there.
The tests didn't run before because they had
  @requires vm.opt.final.EnableJVMCI == true

With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore
but set it explicitly in graal tests.

Not all GCs support JVMCI compiler and in particular ZGC.
The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests.
Also removed graal unit tests from hotspot_compiler_xcomp group.

@requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI.


       JBS: https://bugs.openjdk.java.net/browse/JDK-8235773
    webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html
   testing: tier1-3

regards,
-katya


From igor.ignatyev at oracle.com  Thu Dec 12 00:52:52 2019
From: igor.ignatyev at oracle.com (Igor Ignatev)
Date: Wed, 11 Dec 2019 16:52:52 -0800
Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to
 run with ZGC
Message-ID: <84F0084D-E8C3-4DF2-9783-0BC6AC1E8ED8@oracle.com>

?
LGTM. 
One comment though, won?t it be more appropriate to add graal unit tests to tier1_compiler_not_xcomp ?

? Igor

> On Dec 11, 2019, at 4:41 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> ?
> Please review the urgent fix for regression introduced by JDK-8215728.
> 
> graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there.
> The tests didn't run before because they had
> @requires vm.opt.final.EnableJVMCI == true
> 
> With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore
> but set it explicitly in graal tests.
> 
> Not all GCs support JVMCI compiler and in particular ZGC.
> The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests.
> Also removed graal unit tests from hotspot_compiler_xcomp group.
> 
> @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI.
> 
> 
> 
>     JBS: https://bugs.openjdk.java.net/browse/JDK-8235773
>  webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html
> testing: tier1-3
> 
> regards,
> -katya

From ekaterina.pavlova at oracle.com  Thu Dec 12 01:15:27 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 11 Dec 2019 17:15:27 -0800
Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to
 run with ZGC
In-Reply-To: <84F0084D-E8C3-4DF2-9783-0BC6AC1E8ED8@oracle.com>
References: <84F0084D-E8C3-4DF2-9783-0BC6AC1E8ED8@oracle.com>
Message-ID: <62f24e6d-3256-e46d-e868-09208780939e@oracle.com>

On 12/11/19 4:52 PM, Igor Ignatev wrote:
> ?
> LGTM.
> One comment though, won?t it be more appropriate to add graal unit tests to tier1_compiler_not_xcomp ?

agree, this is better, regenerated webrev

thanks!

-katya

> ? Igor
> 
>> On Dec 11, 2019, at 4:41 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> ?
>> Please review the urgent fix for regression introduced by JDK-8215728.
>>
>> graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there.
>> The tests didn't run before because they had
>> @requires vm.opt.final.EnableJVMCI == true
>>
>> With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore
>> but set it explicitly in graal tests.
>>
>> Not all GCs support JVMCI compiler and in particular ZGC.
>> The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests.
>> Also removed graal unit tests from hotspot_compiler_xcomp group.
>>
>> @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI.
>>
>>
>>
>> ????JBS: https://bugs.openjdk.java.net/browse/JDK-8235773
>> ?webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html
>> testing: tier1-3
>>
>> regards,
>> -katya
>>


From igor.ignatyev at oracle.com  Thu Dec 12 01:21:03 2019
From: igor.ignatyev at oracle.com (Igor Ignatev)
Date: Wed, 11 Dec 2019 17:21:03 -0800
Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to
 run with ZGC
In-Reply-To: <62f24e6d-3256-e46d-e868-09208780939e@oracle.com>
References: <62f24e6d-3256-e46d-e868-09208780939e@oracle.com>
Message-ID: <8A188BA9-BDC3-4A1B-AD38-ACE16657AA83@oracle.com>

Perfect. Ship it. 

? Igor

> On Dec 11, 2019, at 5:15 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> ?On 12/11/19 4:52 PM, Igor Ignatev wrote:
>> ?
>> LGTM.
>> One comment though, won?t it be more appropriate to add graal unit tests to tier1_compiler_not_xcomp ?
> 
> agree, this is better, regenerated webrev
> 
> thanks!
> 
> -katya
> 
>> ? Igor
>>>> On Dec 11, 2019, at 4:41 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>> 
>>> ?
>>> Please review the urgent fix for regression introduced by JDK-8215728.
>>> 
>>> graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there.
>>> The tests didn't run before because they had
>>> @requires vm.opt.final.EnableJVMCI == true
>>> 
>>> With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore
>>> but set it explicitly in graal tests.
>>> 
>>> Not all GCs support JVMCI compiler and in particular ZGC.
>>> The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests.
>>> Also removed graal unit tests from hotspot_compiler_xcomp group.
>>> 
>>> @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI.
>>> 
>>> 
>>> 
>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8235773
>>>  webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html
>>> testing: tier1-3
>>> 
>>> regards,
>>> -katya
>>> 
> 


From ekaterina.pavlova at oracle.com  Thu Dec 12 01:37:10 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 11 Dec 2019 17:37:10 -0800
Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to
 run with ZGC
In-Reply-To: <8A188BA9-BDC3-4A1B-AD38-ACE16657AA83@oracle.com>
References: <62f24e6d-3256-e46d-e868-09208780939e@oracle.com>
 <8A188BA9-BDC3-4A1B-AD38-ACE16657AA83@oracle.com>
Message-ID: <25a68d6b-2f14-654e-8b15-74bb8de5a2ba@oracle.com>

done


On 12/11/19 5:21 PM, Igor Ignatev wrote:
> Perfect. Ship it.
> 
> ? Igor
> 
>> On Dec 11, 2019, at 5:15 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> ?On 12/11/19 4:52 PM, Igor Ignatev wrote:
>>> ?
>>> LGTM.
>>> One comment though, won?t it be more appropriate to add graal unit tests to tier1_compiler_not_xcomp ?
>>
>> agree, this is better, regenerated webrev
>>
>> thanks!
>>
>> -katya
>>
>>> ? Igor
>>>>> On Dec 11, 2019, at 4:41 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>
>>>> ?
>>>> Please review the urgent fix for regression introduced by JDK-8215728.
>>>>
>>>> graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there.
>>>> The tests didn't run before because they had
>>>> @requires vm.opt.final.EnableJVMCI == true
>>>>
>>>> With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore
>>>> but set it explicitly in graal tests.
>>>>
>>>> Not all GCs support JVMCI compiler and in particular ZGC.
>>>> The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests.
>>>> Also removed graal unit tests from hotspot_compiler_xcomp group.
>>>>
>>>> @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI.
>>>>
>>>>
>>>>
>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8235773
>>>>   webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html
>>>> testing: tier1-3
>>>>
>>>> regards,
>>>> -katya
>>>>
>>
> 


From ekaterina.pavlova at oracle.com  Thu Dec 12 03:41:33 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 11 Dec 2019 19:41:33 -0800
Subject: RFR (T/XS) 8235808: Remove graalunit from tier1_compiler_not_xcomp
Message-ID: <1b201409-cc53-19f5-68b2-e06e2975d402@oracle.com>

Please review one more issue which could result in tier1/tier3 failures.
Tier1 supposes to run only subset of graal unit tests as defined by
tier1_compiler_graal group. So compiler/graalunit were wrongly included
into tier1_compiler_not_xcomp group.


       JBS: https://bugs.openjdk.java.net/browse/JDK-8235808
    webrev: http://cr.openjdk.java.net/~epavlova//8235808/webrev.00/index.html
   testing: tier1-3 (in progress)


thanks,
-katya

From igor.ignatyev at oracle.com  Thu Dec 12 03:53:11 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 11 Dec 2019 19:53:11 -0800
Subject: RFR (T/XS) 8235808: Remove graalunit from tier1_compiler_not_xcomp
In-Reply-To: <1b201409-cc53-19f5-68b2-e06e2975d402@oracle.com>
References: <1b201409-cc53-19f5-68b2-e06e2975d402@oracle.com>
Message-ID: <4A90A180-64AD-46CF-AD76-036F68CC6B2F@oracle.com>

thanks for fixing that. reviewed.
-- Igor

> On Dec 11, 2019, at 7:41 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> Please review one more issue which could result in tier1/tier3 failures.
> Tier1 supposes to run only subset of graal unit tests as defined by
> tier1_compiler_graal group. So compiler/graalunit were wrongly included
> into tier1_compiler_not_xcomp group.
> 
> 
>      JBS: https://bugs.openjdk.java.net/browse/JDK-8235808
>   webrev: http://cr.openjdk.java.net/~epavlova//8235808/webrev.00/index.html
>  testing: tier1-3 (in progress)
> 
> 
> thanks,
> -katya


From igor.veresov at oracle.com  Thu Dec 12 04:37:15 2019
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 11 Dec 2019 20:37:15 -0800
Subject: RFR(XL) 8235634: Update Graal
In-Reply-To: <eb07ca85-9003-29c1-63a8-cd2a408e91a4@oracle.com>
References: <4C30F569-6D7E-48C3-984F-32E03CDD5633@oracle.com>
 <eb07ca85-9003-29c1-63a8-cd2a408e91a4@oracle.com>
Message-ID: <004ABC75-BB6F-4630-99C0-442BE6853F50@oracle.com>

Thanks, Vladimir!

igor


> On Dec 11, 2019, at 3:30 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.  Testing results are good too.
> 
> Thanks,
> Vladimir
> 
> On 12/10/19 11:11 PM, Igor Veresov wrote:
>> Webrev: http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8235634 <https://bugs.openjdk.java.net/browse/JDK-8235634>
>> The list of changes can be found in the JBS issue.
>> Thanks,
>> igor


From ekaterina.pavlova at oracle.com  Thu Dec 12 05:34:40 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 11 Dec 2019 21:34:40 -0800
Subject: RFR (T/XS) 8235808: Remove graalunit from tier1_compiler_not_xcomp
In-Reply-To: <4A90A180-64AD-46CF-AD76-036F68CC6B2F@oracle.com>
References: <1b201409-cc53-19f5-68b2-e06e2975d402@oracle.com>
 <4A90A180-64AD-46CF-AD76-036F68CC6B2F@oracle.com>
Message-ID: <20baa229-df8a-334d-22a6-7e7e348eeb66@oracle.com>

All testing passed, integrated the change.
Thanks Igor for prompt review!

-katya


On 12/11/19 7:53 PM, Igor Ignatyev wrote:
> thanks for fixing that. reviewed.
> -- Igor
> 
>> On Dec 11, 2019, at 7:41 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> Please review one more issue which could result in tier1/tier3 failures.
>> Tier1 supposes to run only subset of graal unit tests as defined by
>> tier1_compiler_graal group. So compiler/graalunit were wrongly included
>> into tier1_compiler_not_xcomp group.
>>
>>
>>       JBS: https://bugs.openjdk.java.net/browse/JDK-8235808
>>    webrev: http://cr.openjdk.java.net/~epavlova//8235808/webrev.00/index.html
>>   testing: tier1-3 (in progress)
>>
>>
>> thanks,
>> -katya
> 


From felix.yang at huawei.com  Thu Dec 12 06:24:25 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Thu, 12 Dec 2019 06:24:25 +0000
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
 <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> 
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8325@dggeml527-mbx.china.huawei.com>

Hi, 

  I have created a webrev for the patch: http://cr.openjdk.java.net/~fyang/8235762/webrev.00/  
  Tested tier1-3 with both aarch64 and x86_64 linux release build.  
  Newly added test case fail without the patch and pass with the patch.  

Thanks,
Felix

> -----Original Message-----
> From: Yangfei (Felix)
> Sent: Wednesday, December 11, 2019 11:12 PM
> To: 'Christian Hagedorn' <christian.hagedorn at oracle.com>; Tobias Hartmann
> <tobias.hartmann at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
> 
> Hi Christian,
> 
> Thanks for the suggestions. Comments inlined.
> 
> > -----Original Message-----
> > From: Christian Hagedorn [mailto:christian.hagedorn at oracle.com]
> > Sent: Wednesday, December 11, 2019 10:54 PM
> > To: Yangfei (Felix) <felix.yang at huawei.com>; Tobias Hartmann
> > <tobias.hartmann at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> > Subject: Re: RFR(S): 8235762: JVM crash in SWPointer during C2
> > compilation
> >
> > Hi Felix
> >
> > Thanks for working on that. Your fix also seems to work for JDK-8235700.
> > I closed that one as a duplicate of yours.
> >
> > >>> +              for (int i = 0; i < orig_msize; i++) {
> >
> > Should be uint since orig_msize is a uint
> 
> -- Yes, will modify accordingly when I am preparing webrev.
> 
> >
> > >>> +              best_align_to_mem_ref = find_align_to_ref(memops,
> > >> max_idx);
> > >>> +              assert(best_align_to_mem_ref == NULL, "sanity");
> >
> > You can merge these two lines together into
> > assert(find_align_to_ref(memops,
> > max_idx) == NULL, "sanity"); since the call belongs to the sanity
> > check. Or just surround it by a #ifdef ASSERT.
> 
> -- The purpose of line 721 here is to calculate the max_idx.
>   So I don't think it's suitable to treat this line as assertion logic.
> 
> > >>> +  idx = max_idx;
> >
> > Is max_idx always guaranteed to be valid and not -1 when accessing it later?
> 
> -- Yes, I think so.
>   When memops is not empty and the memory ops in memops are not
> comparable, find_align_to_ref will always sets its max_idx.
> 
> Thanks,
> Felix

From vladimir.x.ivanov at oracle.com  Thu Dec 12 10:22:06 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 12 Dec 2019 13:22:06 +0300
Subject: [14] RFR (L): 8235756: C2: Merge AD instructions for DivV, SqrtV, 
 FmaV, AddReductionV, and MulReductionV nodes
In-Reply-To: <6ab0d5cd-3720-a08d-2b97-a520ffc0ce34@oracle.com>
References: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com>
 <6ab0d5cd-3720-a08d-2b97-a520ffc0ce34@oracle.com>
Message-ID: <ef0a8652-cf1f-0277-2074-cbc16ac1b780@oracle.com>

Thanks for the reviews, Vladimir & John.

> May be you need to separate Reduction changes into a separate patch so 
> we can have more time to review it. It is more complex.

Sure, I filed JDK-8235824 [1] and removed all reduction-related changes 
from current patch.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8235824

> On 12/11/19 3:53 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/all/
>> https://bugs.openjdk.java.net/browse/JDK-8235756
>>
>> Merge AD instructions for the following vector nodes:
>> ?? - DivVF/DivVD
>> ?? - SqrtVF/SqrtVD
>> ?? - FmaVF/FmaVD
>> ?? - AddReductionV*
>> ?? - MulReductionV*
>>
>> Individual patches:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/individual 
>>
>>
>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc)
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Best regards,
>> Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Thu Dec 12 10:40:26 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 12 Dec 2019 13:40:26 +0300
Subject: [15] RFR (L): 8235824: C2: Merge AD instructions for AddReductionV
 and MulReductionV nodes
Message-ID: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8235824/webrev.00/all/
https://bugs.openjdk.java.net/browse/JDK-8235824

Merge AD instructions for the following vector nodes:
   - AddReductionV*
   - MulReductionV*

Individual patches:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235824/webrev.00/individual

Testing: tier1-4, test run on different CPU flavors (KNL, CLX)

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, jrose, ?

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Thu Dec 12 11:19:57 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 12 Dec 2019 14:19:57 +0300
Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes
Message-ID: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/
https://bugs.openjdk.java.net/browse/JDK-8235825

Merge AD instructions for the following vector nodes:
   - ReplicateB, ..., ReplicateD

Individual patches:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual

Testing: tier1-4, test run on different CPU flavors (KNL, CLX)

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Thu Dec 12 14:41:07 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 12 Dec 2019 17:41:07 +0300
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
Message-ID: <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>

Hi Jason,

> I once experimented [2] with a change to Hotspot to consider our
> bridge-like forwarder methods as exempt from the MaxInlineLevel budget, as
> is the case for lambda form frames. That change alone delivered most of the
> performance benefit of doubling MaxInlneLevel with an unmodified JVM.

Nice! The idea to exclude bridge-like methods from max inline level 
accounting looks very promising. Do you have any plans to continue 
working on it and contribute as a patch into the mainline at some point?

On the heuristic itself, it looks like it can be safely generalized to 
any methods which just call another method irrespective of how many 
arguments they pass (and in what order).

Best regards,
Vladimir Ivanov

> [1] https://github.com/scala/bug/issues/11627#issuecomment-514490505
> [2] https://gist.github.com/retronym/d27090a68570485c3e329a52db0c7b25

From vladimir.kozlov at oracle.com  Thu Dec 12 18:20:25 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Dec 2019 10:20:25 -0800
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <a690384a-2002-2d95-ad69-45fb66bc3452@oracle.com>
 <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com>
Message-ID: <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com>

Hi David,

Tiered is disabled because we don't want to see compilations and outputs 
from C1 compiler which does not have EA.

The test is specifically written for C2 only (not for C1 or Graal) to 
verify its Escape Analysis optimization.
I did not look in great details into test's code but its analysis may be 
affected if C1 compiler is also used.

Richard may clarify this.

thanks,
Vladimir

On 12/11/19 1:04 PM, David Holmes wrote:
> On 12/12/2019 5:21 am, Vladimir Kozlov wrote:
>> I will do full review later. I want to comment about test command line.
>>
>> You don't need vm.opt.TieredCompilation != true in @requires because 
>> you specified -XX:-TieredCompilation in @run command.
> 
> And per my comment this should be being tested with tiered as well.
> 
> David
> 
>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip 
>> test from running in Interpreter mode too.
>>
>> Thanks,
>> Vladimir
>>
>> On 12/11/19 7:07 AM, Reingruber, Richard wrote:
>>> Hi David,
>>>
>>> ?? > Most of the details here are in areas I can comment on in 
>>> detail, but I
>>> ?? > did take an initial general look at things.
>>>
>>> Thanks for taking the time!
>>>
>>> ?? > The only thing that jumped out at me is that I think the
>>> ?? > DeoptimizeObjectsALotThread should be a hidden thread.
>>> ?? >
>>> ?? > +? bool is_hidden_from_external_view() const { return true; }
>>>
>>> Yes, it should. Will add the method like above.
>>>
>>> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. 
>>> Without
>>> ?? > active testing this will just bit-rot.
>>>
>>> DeoptimizeObjectsALot is meant for stress testing with a larger 
>>> workload. I will add a minimal test
>>> to keep it fresh.
>>>
>>> ?? > Also on the tests I don't understand your @requires clause:
>>> ?? >
>>> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>> ?? > (vm.opt.TieredCompilation != true))
>>> ?? >
>>> ?? > This seems to require that TieredCompilation is disabled, but 
>>> tiered is
>>> ?? > our normal mode of operation. ??
>>> ?? >
>>>
>>> I removed the clause. I guess I wanted to target the tests towards 
>>> the code they are supposed to
>>> test, and it's easier to analyze failures w/o tiered compilation and 
>>> with just one compiler thread.
>>>
>>> Additionally I will make use of 
>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>
>>> Thanks,
>>> Richard.
>>>
>>> -----Original Message-----
>>> From: David Holmes <david.holmes at oracle.com>
>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>>> serviceability-dev at openjdk.java.net; 
>>> hotspot-compiler-dev at openjdk.java.net; 
>>> hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better 
>>> Performance in the Presence of JVMTI Agents
>>>
>>> Hi Richard,
>>>
>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>> Hi,
>>>>
>>>> I would like to get reviews please for
>>>>
>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>
>>>> Corresponding RFE:
>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>
>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>
>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without 
>>>> issues (thanks!). In addition the
>>>> change is being tested at SAP since I posted the first RFR some 
>>>> months ago.
>>>>
>>>> The intention of this enhancement is to benefit performance wise 
>>>> from escape analysis even if JVMTI
>>>> agents request capabilities that allow them to access local variable 
>>>> values. E.g. if you start-up
>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then 
>>>> escape analysis is disabled right
>>>> from the beginning, well before a debugger attaches -- if ever one 
>>>> should do so. With the
>>>> enhancement, escape analysis will remain enabled until and after a 
>>>> debugger attaches. EA based
>>>> optimizations are reverted just before an agent acquires the 
>>>> reference to an object. In the JBS item
>>>> you'll find more details.
>>>
>>> Most of the details here are in areas I can comment on in detail, but I
>>> did take an initial general look at things.
>>>
>>> The only thing that jumped out at me is that I think the
>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>
>>> +? bool is_hidden_from_external_view() const { return true; }
>>>
>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
>>> active testing this will just bit-rot.
>>>
>>> Also on the tests I don't understand your @requires clause:
>>>
>>> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>> (vm.opt.TieredCompilation != true))
>>>
>>> This seems to require that TieredCompilation is disabled, but tiered is
>>> our normal mode of operation. ??
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch 
>>>>
>>>>

From aph at redhat.com  Thu Dec 12 19:14:53 2019
From: aph at redhat.com (Andrew Haley)
Date: Thu, 12 Dec 2019 19:14:53 +0000
Subject: RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset
Message-ID: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>

This assertion failure happens because load/store patterns in
aarch64.ad are incorrect.

The operands immIOffset and operand immLoffset() are only correct for
load/store *byte* offsets. Offsets of sizes greater than a byte should
be shifted by the operand size, and misaligned offsets are only
allowed for a small range. We get this wrong, so we try to use
misaligned byte addresses for sizes greater than byte. This fails
at compile time.

We've never noticed this before because Java code doesn't generate
misaligned offsets, so we can only test this fix by using Unsafe (or a
customer of Unsafe such as ByteBuffer.)

Wang Zhuo(Zhuoren) <zhuoren.wz at alibaba-inc.com> wrote a patch for this
bug but it was incomplete; the problem reached deeper than we at first
realized. Zhuoren's approach was to fix up the code generation after
pattern matching, but this masks an efficiency problem. In many cases
where we could use offset addresses, we don't because the pattern
matcher incorrectly decides offsets are out of range.

This patch fixes both problems by using memory types that are correct
for all operand sizes. These are memory1, memory2, memory4, etc; this
naming fits in with the existing types used by vectors.

It does lead to a rather large patch, but it's not quite as bad as it
looks because much of the code is auto-generated by the script
ad_encode.m4. Unfortunately, in the time since it was written some
developers have edited that auto-generated section of aarch64.ad by
hand, so I had to move these hand edits out of the way. I also

I have also added big scary

  // DO NOT EDIT ANYTHING IN THIS SECTION OF THE FILE

comments so this doesn't happen again.

In a rather belt-and-braces way I've also added some code that fixes
up illegal addresses. I did consider removing it, but I left it in
because it doesn't hurt. If we ever do generate similar illegal
addresses, debug builds will assert. I'm not sure whether to keep this
or not.

Andrew Dinn will probably have a cow when he sees this patch. :-)

OK for HEAD?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From john.r.rose at oracle.com  Thu Dec 12 22:20:57 2019
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 12 Dec 2019 14:20:57 -0800
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
 <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
Message-ID: <B91422AE-70B0-456E-86E4-FA07A43864EA@oracle.com>

On Dec 12, 2019, at 6:41 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> On the heuristic itself, it looks like it can be safely generalized to any methods which just call another method irrespective of how many arguments they pass (and in what order).

Yes, I like that heuristic; it boils down to a method body
containing just one non-trivial method invocation, plus
?other cheap stuff?.

I?d want to consider making sure that the ?other cheap stuff?
which is disregarded include method invocations that we know
are commonly used in adapters (including those of lambda forms),
such as:

- type adjusting calls: Class.cast, W.valueOf, W.XValue, etc. (incl. internals)
- maybe queries such as Class.isInstance
- resolved non-invocations like getfield, getstatic, ldc
- other stuff that we know ?bottoms out? quickly

The point about ?bottoming out? is that a tree with one main
branch and no side branches, or shallow twiggy side branches
only, does not expand more than linearly into IR after inlining.

The point about focusing on type adaptation stuff is that such
stuff tends to fold away when stacked on top of itself.  There are
only so many type adaptations you can perform repeatedly on
the same value before you reach a fix point, and the JIT is good
at computing such fix points.  The result is that such single-branch
trees may be expected to fold into sub-linear IR, compared to the
size of the original branch, and that?s the win we are after.

I suppose the ?cheap stuff" calls might be charged to a side counter,
which if it gets huge (>100) stops the inline from going deeper.

Let the prototyping begin.

My $0.02.

? John

P.S. I?m glad we are discussing this.  For the record, we have historically
stepped with great caution into changing the inlining heuristics, because
beneficial changes, even if small, can cause rare regressions.  In an
ecosystem as large as ours, even rare regressions can be a significant
cost.  But now, as we are getting used to the new discipline of updating
on a regular 6-month cadence, I think we have a much healthier process
in which to detect and fix regressions that may stem from changed
heuristics.

P.P.S Do we have a greater risk of regression because there is something
broken about our heuristics?  Well, there is an art to building heuristics
which are stable, when your system dynamics have (like ours) a nearly
infinite set of non-linear feedback loops.  Our heuristics, being simple,
are reasonably stable.  Any new heuristics, like the ones I proposed above,
should be examined for stability as well as effectiveness.  In the end, a
perfect heuristic would have to closely emulate the future dynamics of
the system being optimized, and that (as everyone knows) is almost certainly
no cheaper than just running the system un-optimized.  In general, there
will always be workloads that defeat any particular heuristic, even if that
heuristic produces good (or null) results 99% of the time.


From smita.kamath at intel.com  Thu Dec 12 22:43:50 2019
From: smita.kamath at intel.com (Kamath, Smita)
Date: Thu, 12 Dec 2019 22:43:50 +0000
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
Message-ID: <6563F381B547594081EF9DE181D07912B2D7B0A9@fmsmsx123.amr.corp.intel.com>

Hi Vladimir,

Thanks for reviewing the code. I will make changes to the code as per your suggestion and submit another webrev for review.
I will also post performance gains expected with this change (with and without VBMI2). 

The vector instructions in this code will be executed only after a threshold has reached. I have taken care that the vector code will be executed only on CPU's supporting VBMI2.

Thanks,
Smita


-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Wednesday, December 11, 2019 10:55 AM
To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64

Hi Kamath,

First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2.
What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.

Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?

Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing.

In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. 
For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.

I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset.

In vm_version_x86.cpp you need to add more %s in print statement for new output.

You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.

You need to add intrinsics to Graal's test to ignore them:

http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416

Thanks,
Vladimir

On 12/10/19 5:41 PM, Kamath, Smita wrote:
> Hi,
> 
> 
> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.
> 
> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
> 
> Link to webrev : 
> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
> 
> 
> 
> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
> 
> 
> [1] 
> https://software.intel.com/sites/default/files/managed/39/c5/325462-sd
> m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. 
> 2C 5-471)
> 
> [2] 
> https://software.intel.com/en-us/articles/intel-software-development-e
> mulator
> 
> 
> Regards,
> 
> Smita Kamath
> 

From richard.reingruber at sap.com  Thu Dec 12 23:02:26 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Thu, 12 Dec 2019 23:02:26 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <a690384a-2002-2d95-ad69-45fb66bc3452@oracle.com>
 <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com>
 <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com>
Message-ID: <DB7PR02MB3612074D337EA1B024063A289B550@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hello Vladimir,

thanks for having a look.

  > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip
  > test from running in Interpreter mode too.

Done.

  > You don't need vm.opt.TieredCompilation != true in @requires because you
  > specified -XX:-TieredCompilation in @run command.

Ok.

  > The test is specifically written for C2 only (not for C1 or Graal) to
  > verify its Escape Analysis optimization.
  > I did not look in great details into test's code but its analysis may be
  > affected if C1 compiler is also used.
  > 
  > Richard may clarify this.

The test cases aim to get their testmethod 'dontinline_testMethod' compiled by C2. If they get C1
compiled before doesn't matter all that much. I've got a slight preference to disabled tiered
compilation for simplicity.

Thanks, Richard.

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Donnerstag, 12. Dezember 2019 19:20
To: David Holmes <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Reingruber, Richard <richard.reingruber at sap.com>
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi David,

Tiered is disabled because we don't want to see compilations and outputs 
from C1 compiler which does not have EA.

The test is specifically written for C2 only (not for C1 or Graal) to 
verify its Escape Analysis optimization.
I did not look in great details into test's code but its analysis may be 
affected if C1 compiler is also used.

Richard may clarify this.

thanks,
Vladimir

On 12/11/19 1:04 PM, David Holmes wrote:
> On 12/12/2019 5:21 am, Vladimir Kozlov wrote:
>> I will do full review later. I want to comment about test command line.
>>
>> You don't need vm.opt.TieredCompilation != true in @requires because 
>> you specified -XX:-TieredCompilation in @run command.
> 
> And per my comment this should be being tested with tiered as well.
> 
> David
> 
>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip 
>> test from running in Interpreter mode too.
>>
>> Thanks,
>> Vladimir
>>
>> On 12/11/19 7:07 AM, Reingruber, Richard wrote:
>>> Hi David,
>>>
>>> ?? > Most of the details here are in areas I can comment on in 
>>> detail, but I
>>> ?? > did take an initial general look at things.
>>>
>>> Thanks for taking the time!
>>>
>>> ?? > The only thing that jumped out at me is that I think the
>>> ?? > DeoptimizeObjectsALotThread should be a hidden thread.
>>> ?? >
>>> ?? > +? bool is_hidden_from_external_view() const { return true; }
>>>
>>> Yes, it should. Will add the method like above.
>>>
>>> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. 
>>> Without
>>> ?? > active testing this will just bit-rot.
>>>
>>> DeoptimizeObjectsALot is meant for stress testing with a larger 
>>> workload. I will add a minimal test
>>> to keep it fresh.
>>>
>>> ?? > Also on the tests I don't understand your @requires clause:
>>> ?? >
>>> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>> ?? > (vm.opt.TieredCompilation != true))
>>> ?? >
>>> ?? > This seems to require that TieredCompilation is disabled, but 
>>> tiered is
>>> ?? > our normal mode of operation. ??
>>> ?? >
>>>
>>> I removed the clause. I guess I wanted to target the tests towards 
>>> the code they are supposed to
>>> test, and it's easier to analyze failures w/o tiered compilation and 
>>> with just one compiler thread.
>>>
>>> Additionally I will make use of 
>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>
>>> Thanks,
>>> Richard.
>>>
>>> -----Original Message-----
>>> From: David Holmes <david.holmes at oracle.com>
>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>>> serviceability-dev at openjdk.java.net; 
>>> hotspot-compiler-dev at openjdk.java.net; 
>>> hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better 
>>> Performance in the Presence of JVMTI Agents
>>>
>>> Hi Richard,
>>>
>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>> Hi,
>>>>
>>>> I would like to get reviews please for
>>>>
>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>
>>>> Corresponding RFE:
>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>
>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>
>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without 
>>>> issues (thanks!). In addition the
>>>> change is being tested at SAP since I posted the first RFR some 
>>>> months ago.
>>>>
>>>> The intention of this enhancement is to benefit performance wise 
>>>> from escape analysis even if JVMTI
>>>> agents request capabilities that allow them to access local variable 
>>>> values. E.g. if you start-up
>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then 
>>>> escape analysis is disabled right
>>>> from the beginning, well before a debugger attaches -- if ever one 
>>>> should do so. With the
>>>> enhancement, escape analysis will remain enabled until and after a 
>>>> debugger attaches. EA based
>>>> optimizations are reverted just before an agent acquires the 
>>>> reference to an object. In the JBS item
>>>> you'll find more details.
>>>
>>> Most of the details here are in areas I can comment on in detail, but I
>>> did take an initial general look at things.
>>>
>>> The only thing that jumped out at me is that I think the
>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>
>>> +? bool is_hidden_from_external_view() const { return true; }
>>>
>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
>>> active testing this will just bit-rot.
>>>
>>> Also on the tests I don't understand your @requires clause:
>>>
>>> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>> (vm.opt.TieredCompilation != true))
>>>
>>> This seems to require that TieredCompilation is disabled, but tiered is
>>> our normal mode of operation. ??
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch 
>>>>
>>>>

From david.holmes at oracle.com  Thu Dec 12 23:32:40 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 13 Dec 2019 09:32:40 +1000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <DB7PR02MB3612074D337EA1B024063A289B550@DB7PR02MB3612.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <a690384a-2002-2d95-ad69-45fb66bc3452@oracle.com>
 <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com>
 <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com>
 <DB7PR02MB3612074D337EA1B024063A289B550@DB7PR02MB3612.eurprd02.prod.outlook.com>
Message-ID: <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com>

On 13/12/2019 9:02 am, Reingruber, Richard wrote:
> Hello Vladimir,
> 
> thanks for having a look.
> 
>    > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip
>    > test from running in Interpreter mode too.
> 
> Done.
> 
>    > You don't need vm.opt.TieredCompilation != true in @requires because you
>    > specified -XX:-TieredCompilation in @run command.
> 
> Ok.
> 
>    > The test is specifically written for C2 only (not for C1 or Graal) to
>    > verify its Escape Analysis optimization.
>    > I did not look in great details into test's code but its analysis may be
>    > affected if C1 compiler is also used.
>    >
>    > Richard may clarify this.
> 
> The test cases aim to get their testmethod 'dontinline_testMethod' compiled by C2. If they get C1
> compiled before doesn't matter all that much. I've got a slight preference to disabled tiered
> compilation for simplicity.

My concern - perhaps unfounded - is that this seems to be being tested 
only in a pure C2 environment when the actual changes will have to 
operate correctly in a tiered environment (and JVMCI).

Thanks,
David

> Thanks, Richard.
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Donnerstag, 12. Dezember 2019 19:20
> To: David Holmes <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Reingruber, Richard <richard.reingruber at sap.com>
> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
> 
> Hi David,
> 
> Tiered is disabled because we don't want to see compilations and outputs
> from C1 compiler which does not have EA.
> 
> The test is specifically written for C2 only (not for C1 or Graal) to
> verify its Escape Analysis optimization.
> I did not look in great details into test's code but its analysis may be
> affected if C1 compiler is also used.
> 
> Richard may clarify this.
> 
> thanks,
> Vladimir
> 
> On 12/11/19 1:04 PM, David Holmes wrote:
>> On 12/12/2019 5:21 am, Vladimir Kozlov wrote:
>>> I will do full review later. I want to comment about test command line.
>>>
>>> You don't need vm.opt.TieredCompilation != true in @requires because
>>> you specified -XX:-TieredCompilation in @run command.
>>
>> And per my comment this should be being tested with tiered as well.
>>
>> David
>>
>>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip
>>> test from running in Interpreter mode too.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 12/11/19 7:07 AM, Reingruber, Richard wrote:
>>>> Hi David,
>>>>
>>>>  ?? > Most of the details here are in areas I can comment on in
>>>> detail, but I
>>>>  ?? > did take an initial general look at things.
>>>>
>>>> Thanks for taking the time!
>>>>
>>>>  ?? > The only thing that jumped out at me is that I think the
>>>>  ?? > DeoptimizeObjectsALotThread should be a hidden thread.
>>>>  ?? >
>>>>  ?? > +? bool is_hidden_from_external_view() const { return true; }
>>>>
>>>> Yes, it should. Will add the method like above.
>>>>
>>>>  ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>> Without
>>>>  ?? > active testing this will just bit-rot.
>>>>
>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>> workload. I will add a minimal test
>>>> to keep it fresh.
>>>>
>>>>  ?? > Also on the tests I don't understand your @requires clause:
>>>>  ?? >
>>>>  ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>  ?? > (vm.opt.TieredCompilation != true))
>>>>  ?? >
>>>>  ?? > This seems to require that TieredCompilation is disabled, but
>>>> tiered is
>>>>  ?? > our normal mode of operation. ??
>>>>  ?? >
>>>>
>>>> I removed the clause. I guess I wanted to target the tests towards
>>>> the code they are supposed to
>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>> with just one compiler thread.
>>>>
>>>> Additionally I will make use of
>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>> serviceability-dev at openjdk.java.net;
>>>> hotspot-compiler-dev at openjdk.java.net;
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>> Performance in the Presence of JVMTI Agents
>>>>
>>>> Hi Richard,
>>>>
>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>> Hi,
>>>>>
>>>>> I would like to get reviews please for
>>>>>
>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>
>>>>> Corresponding RFE:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>
>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>>
>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
>>>>> issues (thanks!). In addition the
>>>>> change is being tested at SAP since I posted the first RFR some
>>>>> months ago.
>>>>>
>>>>> The intention of this enhancement is to benefit performance wise
>>>>> from escape analysis even if JVMTI
>>>>> agents request capabilities that allow them to access local variable
>>>>> values. E.g. if you start-up
>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
>>>>> escape analysis is disabled right
>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>> should do so. With the
>>>>> enhancement, escape analysis will remain enabled until and after a
>>>>> debugger attaches. EA based
>>>>> optimizations are reverted just before an agent acquires the
>>>>> reference to an object. In the JBS item
>>>>> you'll find more details.
>>>>
>>>> Most of the details here are in areas I can comment on in detail, but I
>>>> did take an initial general look at things.
>>>>
>>>> The only thing that jumped out at me is that I think the
>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>
>>>> +? bool is_hidden_from_external_view() const { return true; }
>>>>
>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
>>>> active testing this will just bit-rot.
>>>>
>>>> Also on the tests I don't understand your @requires clause:
>>>>
>>>>  ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>> (vm.opt.TieredCompilation != true))
>>>>
>>>> This seems to require that TieredCompilation is disabled, but tiered is
>>>> our normal mode of operation. ??
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
>>>>>
>>>>>

From vladimir.kozlov at oracle.com  Thu Dec 12 23:37:25 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Dec 2019 15:37:25 -0800
Subject: [15] RFR (L): 8235824: C2: Merge AD instructions for
 AddReductionV and MulReductionV nodes
In-Reply-To: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com>
References: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com>
Message-ID: <95e1ce7b-a8b6-84a0-2ed9-d5fe25020aff@oracle.com>

Looks good. I wish we can do more folding of code but instructions are too different. What is done 
is enough for this project.

Thanks,
Vladimir

On 12/12/19 2:40 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235824/webrev.00/all/
> https://bugs.openjdk.java.net/browse/JDK-8235824
> 
> Merge AD instructions for the following vector nodes:
>  ? - AddReductionV*
>  ? - MulReductionV*
> 
> Individual patches:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235824/webrev.00/individual
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, CLX)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, jrose, ?
> 
> Best regards,
> Vladimir Ivanov

From david.holmes at oracle.com  Thu Dec 12 23:55:35 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 13 Dec 2019 09:55:35 +1000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
Message-ID: <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>

Hi Richard,

Some further queries/concerns:

src/hotspot/share/runtime/objectMonitor.cpp

Can you please explain the changes to ObjectMonitor::wait:

!   _recursions = save      // restore the old recursion count
!                 + jt->get_and_reset_relock_count_after_wait(); // 
increased by the deferred relock count

what is the "deferred relock count"? I gather it relates to

"The code was extended to be able to deoptimize objects of a frame that 
is not the top frame and to let another thread than the owning thread do 
it."

which I don't like the sound of at all when it comes to ObjectMonitor 
state. So I'd like to understand in detail exactly what is going on here 
and why.  This is a very intrusive change that seems to badly break 
encapsulation and impacts future changes to ObjectMonitor that are under 
investigation.

---

src/hotspot/share/runtime/thread.cpp

Can you please explain why JavaThread::wait_for_object_deoptimization 
has to be handcrafted in this way rather than using proper transitions.

We got rid of "deopt suspend" some time ago and it is disturbing to see 
it being added back (effectively). This seems like it may be something 
that handshakes could be used for.

Thanks,
David
-----

On 12/12/2019 7:02 am, David Holmes wrote:
> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
>> Hi David,
>>
>> ?? > Most of the details here are in areas I can comment on in detail, 
>> but I
>> ?? > did take an initial general look at things.
>>
>> Thanks for taking the time!
> 
> Apologies the above should read:
> 
> "Most of the details here are in areas I *can't* comment on in detail ..."
> 
> David
> 
>> ?? > The only thing that jumped out at me is that I think the
>> ?? > DeoptimizeObjectsALotThread should be a hidden thread.
>> ?? >
>> ?? > +? bool is_hidden_from_external_view() const { return true; }
>>
>> Yes, it should. Will add the method like above.
>>
>> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. 
>> Without
>> ?? > active testing this will just bit-rot.
>>
>> DeoptimizeObjectsALot is meant for stress testing with a larger 
>> workload. I will add a minimal test
>> to keep it fresh.
>>
>> ?? > Also on the tests I don't understand your @requires clause:
>> ?? >
>> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>> ?? > (vm.opt.TieredCompilation != true))
>> ?? >
>> ?? > This seems to require that TieredCompilation is disabled, but 
>> tiered is
>> ?? > our normal mode of operation. ??
>> ?? >
>>
>> I removed the clause. I guess I wanted to target the tests towards the 
>> code they are supposed to
>> test, and it's easier to analyze failures w/o tiered compilation and 
>> with just one compiler thread.
>>
>> Additionally I will make use of 
>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>
>> Thanks,
>> Richard.
>>
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Mittwoch, 11. Dezember 2019 08:03
>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>> serviceability-dev at openjdk.java.net; 
>> hotspot-compiler-dev at openjdk.java.net; 
>> hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better 
>> Performance in the Presence of JVMTI Agents
>>
>> Hi Richard,
>>
>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>> Hi,
>>>
>>> I would like to get reviews please for
>>>
>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>
>>> Corresponding RFE:
>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>
>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>
>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without 
>>> issues (thanks!). In addition the
>>> change is being tested at SAP since I posted the first RFR some 
>>> months ago.
>>>
>>> The intention of this enhancement is to benefit performance wise from 
>>> escape analysis even if JVMTI
>>> agents request capabilities that allow them to access local variable 
>>> values. E.g. if you start-up
>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then 
>>> escape analysis is disabled right
>>> from the beginning, well before a debugger attaches -- if ever one 
>>> should do so. With the
>>> enhancement, escape analysis will remain enabled until and after a 
>>> debugger attaches. EA based
>>> optimizations are reverted just before an agent acquires the 
>>> reference to an object. In the JBS item
>>> you'll find more details.
>>
>> Most of the details here are in areas I can comment on in detail, but I
>> did take an initial general look at things.
>>
>> The only thing that jumped out at me is that I think the
>> DeoptimizeObjectsALotThread should be a hidden thread.
>>
>> +? bool is_hidden_from_external_view() const { return true; }
>>
>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
>> active testing this will just bit-rot.
>>
>> Also on the tests I don't understand your @requires clause:
>>
>> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>> (vm.opt.TieredCompilation != true))
>>
>> This seems to require that TieredCompilation is disabled, but tiered is
>> our normal mode of operation. ??
>>
>> Thanks,
>> David
>>
>>> Thanks,
>>> Richard.
>>>
>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>       
>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch 
>>>
>>>

From vladimir.kozlov at oracle.com  Fri Dec 13 00:56:16 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Dec 2019 16:56:16 -0800
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <a690384a-2002-2d95-ad69-45fb66bc3452@oracle.com>
 <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com>
 <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com>
 <DB7PR02MB3612074D337EA1B024063A289B550@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com>
Message-ID: <b71d94e4-abfb-5384-c6fd-1a302f8fd2f2@oracle.com>

Yes, David

You are correct these changes touch all part of VM and may affect Graal (which also has EA) too.
Changes should be tested in all our modes: tiered, C1 only, Graal, Interpreter. And I realized that 
I only ran tier3-graal testing so I submitted the rest of Graal's tiers now.

I had assumed that our current testing (I ran all from tier1 to tier8) should exercise all paths in 
VM these changes touch. But I may be wrong and it is correct to ask author to add testing in all VM 
modes to make sure new code in VM's runtime and JVMTI is tested.

I do like to keep what current test is doing with C2. May be add an other test for other modes or 
modify current one to enable to run it in other modes.

Thanks,
Vladimir

On 12/12/19 3:32 PM, David Holmes wrote:
> On 13/12/2019 9:02 am, Reingruber, Richard wrote:
>> Hello Vladimir,
>>
>> thanks for having a look.
>>
>> ?? > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip
>> ?? > test from running in Interpreter mode too.
>>
>> Done.
>>
>> ?? > You don't need vm.opt.TieredCompilation != true in @requires because you
>> ?? > specified -XX:-TieredCompilation in @run command.
>>
>> Ok.
>>
>> ?? > The test is specifically written for C2 only (not for C1 or Graal) to
>> ?? > verify its Escape Analysis optimization.
>> ?? > I did not look in great details into test's code but its analysis may be
>> ?? > affected if C1 compiler is also used.
>> ?? >
>> ?? > Richard may clarify this.
>>
>> The test cases aim to get their testmethod 'dontinline_testMethod' compiled by C2. If they get C1
>> compiled before doesn't matter all that much. I've got a slight preference to disabled tiered
>> compilation for simplicity.
> 
> My concern - perhaps unfounded - is that this seems to be being tested only in a pure C2 environment 
> when the actual changes will have to operate correctly in a tiered environment (and JVMCI).
> 
> Thanks,
> David
> 
>> Thanks, Richard.
>>
>> -----Original Message-----
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Sent: Donnerstag, 12. Dezember 2019 19:20
>> To: David Holmes <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net; 
>> hotspot-compiler-dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Reingruber, Richard 
>> <richard.reingruber at sap.com>
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of 
>> JVMTI Agents
>>
>> Hi David,
>>
>> Tiered is disabled because we don't want to see compilations and outputs
>> from C1 compiler which does not have EA.
>>
>> The test is specifically written for C2 only (not for C1 or Graal) to
>> verify its Escape Analysis optimization.
>> I did not look in great details into test's code but its analysis may be
>> affected if C1 compiler is also used.
>>
>> Richard may clarify this.
>>
>> thanks,
>> Vladimir
>>
>> On 12/11/19 1:04 PM, David Holmes wrote:
>>> On 12/12/2019 5:21 am, Vladimir Kozlov wrote:
>>>> I will do full review later. I want to comment about test command line.
>>>>
>>>> You don't need vm.opt.TieredCompilation != true in @requires because
>>>> you specified -XX:-TieredCompilation in @run command.
>>>
>>> And per my comment this should be being tested with tiered as well.
>>>
>>> David
>>>
>>>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip
>>>> test from running in Interpreter mode too.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 12/11/19 7:07 AM, Reingruber, Richard wrote:
>>>>> Hi David,
>>>>>
>>>>> ??? > Most of the details here are in areas I can comment on in
>>>>> detail, but I
>>>>> ??? > did take an initial general look at things.
>>>>>
>>>>> Thanks for taking the time!
>>>>>
>>>>> ??? > The only thing that jumped out at me is that I think the
>>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread.
>>>>> ??? >
>>>>> ??? > +? bool is_hidden_from_external_view() const { return true; }
>>>>>
>>>>> Yes, it should. Will add the method like above.
>>>>>
>>>>> ??? > Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>>> Without
>>>>> ??? > active testing this will just bit-rot.
>>>>>
>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>>> workload. I will add a minimal test
>>>>> to keep it fresh.
>>>>>
>>>>> ??? > Also on the tests I don't understand your @requires clause:
>>>>> ??? >
>>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>> ??? > (vm.opt.TieredCompilation != true))
>>>>> ??? >
>>>>> ??? > This seems to require that TieredCompilation is disabled, but
>>>>> tiered is
>>>>> ??? > our normal mode of operation. ??
>>>>> ??? >
>>>>>
>>>>> I removed the clause. I guess I wanted to target the tests towards
>>>>> the code they are supposed to
>>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>>> with just one compiler thread.
>>>>>
>>>>> Additionally I will make use of
>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>>> serviceability-dev at openjdk.java.net;
>>>>> hotspot-compiler-dev at openjdk.java.net;
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>>> Performance in the Presence of JVMTI Agents
>>>>>
>>>>> Hi Richard,
>>>>>
>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I would like to get reviews please for
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>>
>>>>>> Corresponding RFE:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>>
>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>>>
>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
>>>>>> issues (thanks!). In addition the
>>>>>> change is being tested at SAP since I posted the first RFR some
>>>>>> months ago.
>>>>>>
>>>>>> The intention of this enhancement is to benefit performance wise
>>>>>> from escape analysis even if JVMTI
>>>>>> agents request capabilities that allow them to access local variable
>>>>>> values. E.g. if you start-up
>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
>>>>>> escape analysis is disabled right
>>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>>> should do so. With the
>>>>>> enhancement, escape analysis will remain enabled until and after a
>>>>>> debugger attaches. EA based
>>>>>> optimizations are reverted just before an agent acquires the
>>>>>> reference to an object. In the JBS item
>>>>>> you'll find more details.
>>>>>
>>>>> Most of the details here are in areas I can comment on in detail, but I
>>>>> did take an initial general look at things.
>>>>>
>>>>> The only thing that jumped out at me is that I think the
>>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>
>>>>> +? bool is_hidden_from_external_view() const { return true; }
>>>>>
>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
>>>>> active testing this will just bit-rot.
>>>>>
>>>>> Also on the tests I don't understand your @requires clause:
>>>>>
>>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>> (vm.opt.TieredCompilation != true))
>>>>>
>>>>> This seems to require that TieredCompilation is disabled, but tiered is
>>>>> our normal mode of operation. ??
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
>>>>>>
>>>>>>

From vladimir.kozlov at oracle.com  Fri Dec 13 01:16:29 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Dec 2019 17:16:29 -0800
Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate
 nodes
In-Reply-To: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>
References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>
Message-ID: <d44e28db-19d9-b00b-8a53-40e7e934d4be@oracle.com>

Vladimir

replicateB

Can you fold it differently?

ReplB_reg_leg
   predicate(!VM_Version::supports_avx512vlbw());
   ins_encode %{
     uint vlen = vector_length(this);
     __ movdl($dst$$XMMRegister, $src$$Register);
     __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister);
     __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00);
     if (vlen > 8) {
       __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister);
       if (vlen > 16) {
         __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister);
         if (vlen > 32) {
           assert(vlen == 64, "sanity");
           __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, $dst$$XMMRegister, 0x1);

Similar ReplB_imm_leg for which I don't see new implementation.

It should also simplify code for avx512 which one or 2 instructions.

Other types changes can be done same way.

Thanks,
Vladimir

On 12/12/19 3:19 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/
> https://bugs.openjdk.java.net/browse/JDK-8235825
> 
> Merge AD instructions for the following vector nodes:
>  ? - ReplicateB, ..., ReplicateD
> 
> Individual patches:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual
> 
> Testing: tier1-4, test run on different CPU flavors (KNL, CLX)
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Best regards,
> Vladimir Ivanov

From john.r.rose at oracle.com  Fri Dec 13 01:46:03 2019
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 12 Dec 2019 17:46:03 -0800
Subject: [15] RFR (L): 8235824: C2: Merge AD instructions for
 AddReductionV and MulReductionV nodes
In-Reply-To: <95e1ce7b-a8b6-84a0-2ed9-d5fe25020aff@oracle.com>
References: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com>
 <95e1ce7b-a8b6-84a0-2ed9-d5fe25020aff@oracle.com>
Message-ID: <BE371528-98DE-4FC7-AB71-34B26F82134E@oracle.com>

On Dec 12, 2019, at 3:37 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good. I wish we can do more folding of code but instructions are too different. What is done is enough for this project.


+1 Reviewed.

As I mentioned in the 8235756 thread, a good way to factor the
implementation of (associative) reductions would be to reformulate
them as the repeated composition of 2N-to-N-lane reductions.

For non-associative reductions (floating point), the 2N-to-N
pattern is acceptable, *if* the reduction is specified to happen in
that order.  To get that permission into the contract will require
a distinction between reduceSequential and reduceParallel
operations in the Vector API.

That sequence of 16 vaddss operations is certainly an eyesore,
but it?s not clear how to improve on it, algorithmically.  Perhaps
it could be factored into a sequential accumulation operation,
to be repeated N times instead of lg N times.

? John

From david.holmes at oracle.com  Fri Dec 13 01:52:48 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 13 Dec 2019 11:52:48 +1000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <b71d94e4-abfb-5384-c6fd-1a302f8fd2f2@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <a690384a-2002-2d95-ad69-45fb66bc3452@oracle.com>
 <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com>
 <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com>
 <DB7PR02MB3612074D337EA1B024063A289B550@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com>
 <b71d94e4-abfb-5384-c6fd-1a302f8fd2f2@oracle.com>
Message-ID: <01509185-7b0b-a269-deb1-799444cf082f@oracle.com>

On 13/12/2019 10:56 am, Vladimir Kozlov wrote:
> Yes, David
> 
> You are correct these changes touch all part of VM and may affect Graal 
> (which also has EA) too.
> Changes should be tested in all our modes: tiered, C1 only, Graal, 
> Interpreter. And I realized that I only ran tier3-graal testing so I 
> submitted the rest of Graal's tiers now.
> 
> I had assumed that our current testing (I ran all from tier1 to tier8) 
> should exercise all paths in VM these changes touch. But I may be wrong 
> and it is correct to ask author to add testing in all VM modes to make 
> sure new code in VM's runtime and JVMTI is tested.

It may be that our existing JVM TI tests will exercise this adequately 
and that the new tests are more "whitebox" testing than general 
functional tests. But it is not obvious to me that we do have the 
coverage we need.

Cheers,
David

> I do like to keep what current test is doing with C2. May be add an 
> other test for other modes or modify current one to enable to run it in 
> other modes.
> 
> Thanks,
> Vladimir
> 
> On 12/12/19 3:32 PM, David Holmes wrote:
>> On 13/12/2019 9:02 am, Reingruber, Richard wrote:
>>> Hello Vladimir,
>>>
>>> thanks for having a look.
>>>
>>> ?? > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to 
>>> skip
>>> ?? > test from running in Interpreter mode too.
>>>
>>> Done.
>>>
>>> ?? > You don't need vm.opt.TieredCompilation != true in @requires 
>>> because you
>>> ?? > specified -XX:-TieredCompilation in @run command.
>>>
>>> Ok.
>>>
>>> ?? > The test is specifically written for C2 only (not for C1 or 
>>> Graal) to
>>> ?? > verify its Escape Analysis optimization.
>>> ?? > I did not look in great details into test's code but its 
>>> analysis may be
>>> ?? > affected if C1 compiler is also used.
>>> ?? >
>>> ?? > Richard may clarify this.
>>>
>>> The test cases aim to get their testmethod 'dontinline_testMethod' 
>>> compiled by C2. If they get C1
>>> compiled before doesn't matter all that much. I've got a slight 
>>> preference to disabled tiered
>>> compilation for simplicity.
>>
>> My concern - perhaps unfounded - is that this seems to be being tested 
>> only in a pure C2 environment when the actual changes will have to 
>> operate correctly in a tiered environment (and JVMCI).
>>
>> Thanks,
>> David
>>
>>> Thanks, Richard.
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> Sent: Donnerstag, 12. Dezember 2019 19:20
>>> To: David Holmes <david.holmes at oracle.com>; 
>>> hotspot-runtime-dev at openjdk.java.net; 
>>> hotspot-compiler-dev at openjdk.java.net; 
>>> serviceability-dev at openjdk.java.net; Reingruber, Richard 
>>> <richard.reingruber at sap.com>
>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better 
>>> Performance in the Presence of JVMTI Agents
>>>
>>> Hi David,
>>>
>>> Tiered is disabled because we don't want to see compilations and outputs
>>> from C1 compiler which does not have EA.
>>>
>>> The test is specifically written for C2 only (not for C1 or Graal) to
>>> verify its Escape Analysis optimization.
>>> I did not look in great details into test's code but its analysis may be
>>> affected if C1 compiler is also used.
>>>
>>> Richard may clarify this.
>>>
>>> thanks,
>>> Vladimir
>>>
>>> On 12/11/19 1:04 PM, David Holmes wrote:
>>>> On 12/12/2019 5:21 am, Vladimir Kozlov wrote:
>>>>> I will do full review later. I want to comment about test command 
>>>>> line.
>>>>>
>>>>> You don't need vm.opt.TieredCompilation != true in @requires because
>>>>> you specified -XX:-TieredCompilation in @run command.
>>>>
>>>> And per my comment this should be being tested with tiered as well.
>>>>
>>>> David
>>>>
>>>>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip
>>>>> test from running in Interpreter mode too.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 12/11/19 7:07 AM, Reingruber, Richard wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> ??? > Most of the details here are in areas I can comment on in
>>>>>> detail, but I
>>>>>> ??? > did take an initial general look at things.
>>>>>>
>>>>>> Thanks for taking the time!
>>>>>>
>>>>>> ??? > The only thing that jumped out at me is that I think the
>>>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>> ??? >
>>>>>> ??? > +? bool is_hidden_from_external_view() const { return true; }
>>>>>>
>>>>>> Yes, it should. Will add the method like above.
>>>>>>
>>>>>> ??? > Also I don't see any testing of the 
>>>>>> DeoptimizeObjectsALotThread.
>>>>>> Without
>>>>>> ??? > active testing this will just bit-rot.
>>>>>>
>>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>>>> workload. I will add a minimal test
>>>>>> to keep it fresh.
>>>>>>
>>>>>> ??? > Also on the tests I don't understand your @requires clause:
>>>>>> ??? >
>>>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>>> ??? > (vm.opt.TieredCompilation != true))
>>>>>> ??? >
>>>>>> ??? > This seems to require that TieredCompilation is disabled, but
>>>>>> tiered is
>>>>>> ??? > our normal mode of operation. ??
>>>>>> ??? >
>>>>>>
>>>>>> I removed the clause. I guess I wanted to target the tests towards
>>>>>> the code they are supposed to
>>>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>>>> with just one compiler thread.
>>>>>>
>>>>>> Additionally I will make use of
>>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>>>> serviceability-dev at openjdk.java.net;
>>>>>> hotspot-compiler-dev at openjdk.java.net;
>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>>>> Performance in the Presence of JVMTI Agents
>>>>>>
>>>>>> Hi Richard,
>>>>>>
>>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I would like to get reviews please for
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>>>
>>>>>>> Corresponding RFE:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>>>
>>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>>>>
>>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
>>>>>>> issues (thanks!). In addition the
>>>>>>> change is being tested at SAP since I posted the first RFR some
>>>>>>> months ago.
>>>>>>>
>>>>>>> The intention of this enhancement is to benefit performance wise
>>>>>>> from escape analysis even if JVMTI
>>>>>>> agents request capabilities that allow them to access local variable
>>>>>>> values. E.g. if you start-up
>>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
>>>>>>> escape analysis is disabled right
>>>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>>>> should do so. With the
>>>>>>> enhancement, escape analysis will remain enabled until and after a
>>>>>>> debugger attaches. EA based
>>>>>>> optimizations are reverted just before an agent acquires the
>>>>>>> reference to an object. In the JBS item
>>>>>>> you'll find more details.
>>>>>>
>>>>>> Most of the details here are in areas I can comment on in detail, 
>>>>>> but I
>>>>>> did take an initial general look at things.
>>>>>>
>>>>>> The only thing that jumped out at me is that I think the
>>>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>>
>>>>>> +? bool is_hidden_from_external_view() const { return true; }
>>>>>>
>>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. 
>>>>>> Without
>>>>>> active testing this will just bit-rot.
>>>>>>
>>>>>> Also on the tests I don't understand your @requires clause:
>>>>>>
>>>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>>> (vm.opt.TieredCompilation != true))
>>>>>>
>>>>>> This seems to require that TieredCompilation is disabled, but 
>>>>>> tiered is
>>>>>> our normal mode of operation. ??
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch 
>>>>>>>
>>>>>>>
>>>>>>>

From jzaugg at gmail.com  Fri Dec 13 02:47:25 2019
From: jzaugg at gmail.com (Jason Zaugg)
Date: Fri, 13 Dec 2019 12:47:25 +1000
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
 <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
Message-ID: <CAG3_yeVgfPTc0OnEguqTjoFLPAfXTy1kngp2F8d+XwaSvGGRwg@mail.gmail.com>

On Fri, 13 Dec 2019 at 00:43, Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
wrote:

>
> On the heuristic itself, it looks like it can be safely generalized to
> any methods which just call another method irrespective of how many
> arguments they pass (and in what order).
>

Agreed. I was initially aiming for minimality to make it dead-simple to
reason that the
inlinee would not grow and to make the analysis cheap enough to perform
eagerly
during bytecode parsing.


> Nice! The idea to exclude bridge-like methods from max inline level
> accounting looks very promising. Do you have any plans to continue
> working on it and contribute as a patch into the mainline at some point?
>

I could spend some time to clean up the patch. However, I would need
signficant guidance
to extend it to John's suggestion to admit type adaptations / constant
loads / known
cheap methods etc. Would this more thorough analysis it suit a C2-only
analysis
performed on an IR instead? Is the set of cheap methods defined with an
annotation,
a list of owner/name/descriptors or some other means? What is the testing
strategy?
It would likely be more efficient for someone with the experience in the
code base
implement themselves rather than shepherd me through it :)

-jason

From john.r.rose at oracle.com  Fri Dec 13 05:07:00 2019
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 12 Dec 2019 21:07:00 -0800
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <CAG3_yeVgfPTc0OnEguqTjoFLPAfXTy1kngp2F8d+XwaSvGGRwg@mail.gmail.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
 <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
 <CAG3_yeVgfPTc0OnEguqTjoFLPAfXTy1kngp2F8d+XwaSvGGRwg@mail.gmail.com>
Message-ID: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com>

On Dec 12, 2019, at 6:47 PM, Jason Zaugg <jzaugg at gmail.com> wrote:
> 
> I could spend some time to clean up the patch. However, I would need
> signficant guidance
> to extend it to John's suggestion to admit type adaptations / constant
> loads / known
> cheap methods etc. Would this more thorough analysis it suit a C2-only
> analysis
> performed on an IR instead?

C2 doesn?t convert a method into IR until it has made the inlining
decision about it.  It would be a much deeper change to switch that around.

However, ciMethod::get_flow_analysis activates a general purpose
bytecode-based flow analysis which could easily be adjusted to add
the necessary metrics.  Since it is already run from the inline logic,
this wouldn?t be a new pass.

(There?s also ciMethod::get_bcea which performs a much more specialized
escape analysis on methods, after they fail to inline.  I don?t think that?s
the place for the new metrics.)

> Is the set of cheap methods defined with an annotation,
> a list of owner/name/descriptors or some other means?

Something along the lines of ciMethod::is_boxing_method
will work.  In other words, a hardwired white list.
Annotations will probably play a role.  For example, most of the JIT?s
?magically known methods? sport @HotSpotIntrinsicCandidate,
and are registered in vmIntrinsics.hpp.  The annotations @ForceInline
and @DontInline are another example.  We could have a @TinyInline
annotation, meaning ?don?t expect this thing to expand much if you
inline it?, or we could try to detect such things using further metrics,
again in TypeFlow.

> What is the testing strategy?

The usual:  Mainly functional and performance regressions.
Some manual and white box testing to make sure the expected sorts of
inline chains don?t suddenly stop inlining.

> It would likely be more efficient for someone with the experience in the
> code base implement themselves rather than shepherd me through it :)

I?m intrigued that you are interested in this, and I encourage you to consider
pulling on this string some more.  I?ll help you pull if you want.

I think this is doable, to a degree, as a starter project, to install new parameters
and do initial exploration of their settings.  Actually dialing in the settings
and testing them across a range of workloads is a very specialized job, which
few of us are good at, but if the optimization seems to pan out we can find
the necessary kind of expert.

Your first step, should you choose to accept this mission, would be to join
the community.  Your name should be on http://openjdk.java.net/census.
See http://openjdk.java.net/contribute/.

? John

From nick.gasson at arm.com  Fri Dec 13 06:10:24 2019
From: nick.gasson at arm.com (Nick Gasson)
Date: Fri, 13 Dec 2019 14:10:24 +0800
Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK
 due to long offset
In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
Message-ID: <642bcb33-f02f-892b-d258-5ac537fdef4b@arm.com>

On 13/12/2019 03:14, Andrew Haley wrote:
> 
> This patch fixes both problems by using memory types that are correct
> for all operand sizes. These are memory1, memory2, memory4, etc; this
> naming fits in with the existing types used by vectors.
> 

The RFR mail is missing a link to the webrev - is it this one?

http://cr.openjdk.java.net/~aph/8235385/

Thanks,
Nick

From vladimir.x.ivanov at oracle.com  Fri Dec 13 08:38:25 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 13 Dec 2019 11:38:25 +0300
Subject: [15] RFR (L): 8235824: C2: Merge AD instructions for
 AddReductionV and MulReductionV nodes
In-Reply-To: <BE371528-98DE-4FC7-AB71-34B26F82134E@oracle.com>
References: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com>
 <95e1ce7b-a8b6-84a0-2ed9-d5fe25020aff@oracle.com>
 <BE371528-98DE-4FC7-AB71-34B26F82134E@oracle.com>
Message-ID: <6b031827-17f9-6a48-733a-4a78a7b66a51@oracle.com>

Thanks for the reviews, Vladimir & John.

> As I mentioned in the 8235756 thread, a good way to factor the
> implementation of (associative) reductions would be to reformulate
> them as the repeated composition of 2N-to-N-lane reductions.
> 
> For non-associative reductions (floating point), the 2N-to-N
> pattern is acceptable, *if* the reduction is specified to happen in
> that order.  To get that permission into the contract will require
> a distinction between reduceSequential and reduceParallel
> operations in the Vector API.
> 
> That sequence of 16 vaddss operations is certainly an eyesore,
> but it?s not clear how to improve on it, algorithmically.  Perhaps
> it could be factored into a sequential accumulation operation,
> to be repeated N times instead of lg N times.

Yes, I agree that reduction nodes look too high-level for matching 
purposes: having a node per reduction step is much more suitable (at 
least, on x86). I think if the IR is shaped that way (nested reduction 
steps which reduce a vector to a scalar), there's a way to introduce a 
single shared IR node which represents a reduction step (2N => N) across 
all vector shapes.

Best regards,
Vladimir Ivanov

From aph at redhat.com  Fri Dec 13 09:31:23 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 13 Dec 2019 09:31:23 +0000
Subject: Fwd: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK
 due to long offset
In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
Message-ID: <af677f79-844c-5ca9-5a06-0d6d98f4be81@redhat.com>

http://cr.openjdk.java.net/~aph/8235385/

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
-------------- next part --------------
An embedded message was scrubbed...
From: Andrew Haley <aph at redhat.com>
Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due	to long offset
Date: Thu, 12 Dec 2019 19:14:53 +0000
Size: 12171
URL: <https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20191213/2cb4ec8a/attachment-0001.eml>

From aph at redhat.com  Fri Dec 13 09:31:45 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 13 Dec 2019 09:31:45 +0000
Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK
 due to long offset
In-Reply-To: <642bcb33-f02f-892b-d258-5ac537fdef4b@arm.com>
References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
 <642bcb33-f02f-892b-d258-5ac537fdef4b@arm.com>
Message-ID: <1ac65be6-2aa3-6697-46b5-5eec5f52a48e@redhat.com>

On 12/13/19 6:10 AM, Nick Gasson wrote:
> On 13/12/2019 03:14, Andrew Haley wrote:
>>
>> This patch fixes both problems by using memory types that are correct
>> for all operand sizes. These are memory1, memory2, memory4, etc; this
>> naming fits in with the existing types used by vectors.
>>
> 
> The RFR mail is missing a link to the webrev - is it this one?
> 
> http://cr.openjdk.java.net/~aph/8235385/

Yes, sorry.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From vladimir.x.ivanov at oracle.com  Fri Dec 13 10:27:32 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 13 Dec 2019 13:27:32 +0300
Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate
 nodes
In-Reply-To: <d44e28db-19d9-b00b-8a53-40e7e934d4be@oracle.com>
References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>
 <d44e28db-19d9-b00b-8a53-40e7e934d4be@oracle.com>
Message-ID: <b5913422-d9b6-5e13-e1c9-15be352e317a@oracle.com>

Thanks for the feedback, Vladimir.

> replicateB
> 
> Can you fold it differently?
> 
> ReplB_reg_leg

Are you talking about ReplB_reg?

>  ? predicate(!VM_Version::supports_avx512vlbw());

3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{
3152   predicate(n->as_Vector()->length() == 64 && 
!VM_Version::supports_avx512bw());

For ReplB_reg_leg the predicate can't be simplified: it is applicable 
only to 512bit vector when AVX512BW is absent. Otherwise, legVec 
constraint will be unnecessarily applied to other configurations.


3119 instruct ReplB_reg(vec dst, rRegI src) %{
3120   predicate((n->as_Vector()->length() <= 32) ||
3121             (n->as_Vector()->length() == 64 && 
VM_Version::supports_avx512bw()));

For ReplB_reg there's a shorter version:

predicate(n->as_Vector()->length() <= 32 || 
VM_Version::supports_avx512bw());


But do you find it easier to read? For example, when you are checking 
that all configurations are covered:

predicate(n->as_Vector()->length() <= 32 ||
           VM_Version::supports_avx512bw());

instruct ReplB_reg_leg(legVec dst, rRegI src) %{
   predicate(n->as_Vector()->length() == 64 &&
            !VM_Version::supports_avx512bw());

vs

instruct ReplB_reg_leg(legVec dst, rRegI src) %{
   predicate(n->as_Vector()->length() == 64 && 
!VM_Version::supports_avx512bw());

instruct ReplB_reg(vec dst, rRegI src) %{
   predicate((n->as_Vector()->length() <= 32) ||
             (n->as_Vector()->length() == 64 && 
VM_Version::supports_avx512bw()));


>  ? ins_encode %{
>  ??? uint vlen = vector_length(this);
>  ??? __ movdl($dst$$XMMRegister, $src$$Register);
>  ??? __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister);
>  ??? __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00);
>  ??? if (vlen > 8) {
>  ????? __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister);
>  ????? if (vlen > 16) {
>  ??????? __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister);
>  ??????? if (vlen > 32) {
>  ????????? assert(vlen == 64, "sanity");
>  ????????? __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, 
> $dst$$XMMRegister, 0x1);

Yes, it should work as well. Do you find it easier to read though?

> Similar ReplB_imm_leg for which I don't see new implementation.

Good catch. Added it back.

(FTR completeness for reg2reg variants (_reg*) is mandatory. But for 
_mem and _imm it is optional: if they some configuration isn't covered, 
_reg is used. But it was in original code, so I added it back.)

Updated version:
   http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00

Another thing I noticed is that for ReplI/.../ReplD cases avx512vl 
checks are not necessary:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/

The code assumes that evpbroadcastd/evpbroadcastq, 
vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need AVX512VL for 
512bit case, but Intel manual says AVX512F is enough.

I plan to handle it as a separate change, but let me know if you want to 
incorporate it into 8235825.

> It should also simplify code for avx512 which one or 2 instructions.

Can you elaborate, please? Are you talking about the case when version 
for different vector sizes differ in 1-2 instructions (like ReplB_reg)?

> Other types changes can be done same way.

Best regards,
Vladimir Ivanov

> On 12/12/19 3:19 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/
>> https://bugs.openjdk.java.net/browse/JDK-8235825
>>
>> Merge AD instructions for the following vector nodes:
>> ?? - ReplicateB, ..., ReplicateD
>>
>> Individual patches:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual 
>>
>>
>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX)
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Best regards,
>> Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Fri Dec 13 10:44:24 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 13 Dec 2019 13:44:24 +0300
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
 <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
 <CAG3_yeVgfPTc0OnEguqTjoFLPAfXTy1kngp2F8d+XwaSvGGRwg@mail.gmail.com>
 <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com>
Message-ID: <c68bedd4-ef37-e330-e004-cc147a71a497@oracle.com>


>> I could spend some time to clean up the patch. However, I would need
>> signficant guidance
>> to extend it to John's suggestion to admit type adaptations / constant
>> loads / known
>> cheap methods etc. Would this more thorough analysis it suit a C2-only
>> analysis
>> performed on an IR instead?
> 
> C2 doesn?t convert a method into IR until it has made the inlining
> decision about it. ?It would be a much deeper change to switch that around.
> 
> However,?ciMethod::get_flow_analysis activates a general purpose
> bytecode-based flow analysis which could easily be adjusted to add
> the necessary metrics. ?Since it is already run from the inline logic,
> this wouldn?t be a new pass.
> 
> (There?s also ciMethod::get_bcea which performs a much more specialized
> escape analysis on methods, after they fail to inline. ?I don?t think that?s
> the place for the new metrics.)

Also, I'd like to add that it's fine to fix it in incrementally: start 
with something simple and reliable, and then explore more complex 
extensions on top of it.

I don't think it's necessary to  bytecode analysis too far. I agree with 
you that some point it becomes simpler to just parse the method and 
observe the effects than trying to derive them directly from bytecode.

(I fully agree that it requires significant effort to enable IR-based 
analysis in C2. But also there are ways to workaround that and cache the 
analysis results across compilations: gather data during stand-alone 
compilation and then reuse it when doing inlining.)

>> Is the set of cheap methods defined with an annotation,
>> a list of owner/name/descriptors or some other means?
> 
> Something along the lines of ciMethod::is_boxing_method
> will work. ?In other words, a hardwired white list.
> Annotations will probably play a role. ?For example, most of the JIT?s
> ?magically known methods? sport @HotSpotIntrinsicCandidate,
> and are registered in vmIntrinsics.hpp. ?The annotations @ForceInline
> and @DontInline are another example. ?We could have a @TinyInline
> annotation, meaning ?don?t expect this thing to expand much if you
> inline it?, or we could try to detect such things using further metrics,
> again in TypeFlow.

FTR part of @LambdaForm.Compiled semantics is equivalent to @TinyInline.

Best regards,
Vladimir Ivanov

> 
>> What is the testing strategy?
> 
> The usual: ?Mainly functional and performance regressions.
> Some manual and white box testing to make sure the expected sorts of
> inline chains don?t suddenly stop inlining.
> 
>> It would likely be more efficient for someone with the experience in the
>> code base implement themselves rather than shepherd me through it :)
> 
> I?m intrigued that you are interested in this, and I encourage you to 
> consider
> pulling on this string some more. ?I?ll help you pull if you want.
> 
> I think this is doable, to a degree, as a starter project, to install 
> new parameters
> and do initial exploration of their settings. ?Actually dialing in the 
> settings
> and testing them across a range of workloads is a very specialized job, 
> which
> few of us are good at, but if the optimization seems to pan out we can find
> the necessary kind of expert.
> 
> Your first step, should you choose to accept this mission, would be to join
> the community. ?Your name should be on http://openjdk.java.net/census.
> See http://openjdk.java.net/contribute/.
> 
> ? John

From christian.hagedorn at oracle.com  Fri Dec 13 12:46:26 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 13 Dec 2019 13:46:26 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <87h829a4h5.fsf@redhat.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
 <87h829a4h5.fsf@redhat.com>
Message-ID: <24848a55-857b-fc49-d9e1-13076587539f@oracle.com>

Hi Roland

As we have discussed offline and also in discussion with Erik ?sterlund, 
I propose the following alternative fix based on your suggestion:
http://cr.openjdk.java.net/~chagedorn/8231501/webrev.01/

The idea is to first snapshot the data and extra parameter data only and 
translating those. In a second step, the remaining extra data is 
prepared as before (possibly giving up the extra data lock). And only 
afterwards, when the cache is populated and no safepoints can happen 
anymore, the remaining extra data (trap entries and arg info data) is 
snapshotted and translated while holding the extra data lock. This 
ensures that no extra data is cleaned between snapshotting and 
translation since the lock is not released anymore in the meantime.

This fixes this observed concurrency bug and is probably a cleaner 
solution than the original proposed fix.

Best regards,
Christian


On 09.12.19 16:44, Roland Westrelin wrote:
> 
> Hi Christian,
> 
>> Before loading and copying the extra data from the MDO to the ciMDO in
>> ciMethodData::load_extra_data(), the metadata is prepared in a
>> fixed-point iteration by cleaning all SpeculativeTrapData entries of
>> methods whose klasses are unloaded [3]. If it encounters such a dead
>> entry it releases the extra data lock (due to ranking issues) and tries
>> again later [4]. This release of the lock triggers the bug: There can be
>> cases where one thread A is waiting in the whitebox API method to get
>> the extra data lock [2] to clean the extra data for the very same MDO
>> for which another thread B just released the lock at [4]. If that MDO
>> actually contained SpeculativeTrapData entries, then thread A cleaned
>> those but the ciMDO, which thread B is preparing, still contains the
>> uncleaned old MDO extra data (because thread B only made a snapshot of
>> the MDO earlier at [5]).
> 
> Would it be possible to call prepare_data() before the snapshot is taken
> so the snapshot doesn't contain any entry that are then removed?
> 
> Roland.
> 

From rwestrel at redhat.com  Fri Dec 13 12:58:26 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 13 Dec 2019 13:58:26 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <24848a55-857b-fc49-d9e1-13076587539f@oracle.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
 <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com>
Message-ID: <87r2189ybx.fsf@redhat.com>


Hi Christian,

Thanks for experimenting with a different solution.

> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.01/

 173     // New traps in the MDO may have been added since we copied the
 174     // data (concurrent deoptimizations before we acquired
 175     // extra_data_lock above) or can be removed (a safepoint may occur
 176     // in the prepare_metadata call above) as we translate the copy:
 177     // update the copy as we go.

Can the above still happen?

I see you dropped the memcpy so I suppose no?

Roland.


From christian.hagedorn at oracle.com  Fri Dec 13 13:14:28 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 13 Dec 2019 14:14:28 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <87r2189ybx.fsf@redhat.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
 <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com>
 <87r2189ybx.fsf@redhat.com>
Message-ID: <b4988542-3cdb-75a7-0567-e54821811c6b@oracle.com>

Hi Roland

> Thanks for experimenting with a different solution.

No problem, I think this approach is better.

>> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.01/
> 
>   173     // New traps in the MDO may have been added since we copied the
>   174     // data (concurrent deoptimizations before we acquired
>   175     // extra_data_lock above) or can be removed (a safepoint may occur
>   176     // in the prepare_metadata call above) as we translate the copy:
>   177     // update the copy as we go.
> 
> Can the above still happen?
> 
> I see you dropped the memcpy so I suppose no?

No, this cannot happen anymore since the lock is not released anymore 
after the snapshot. I updated the webrev to also remove this comment and 
the code belonging to the already removed memcpy:
http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/

Best regards,
Christian

From rwestrel at redhat.com  Fri Dec 13 13:21:41 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 13 Dec 2019 14:21:41 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <b4988542-3cdb-75a7-0567-e54821811c6b@oracle.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
 <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com>
 <87r2189ybx.fsf@redhat.com> <b4988542-3cdb-75a7-0567-e54821811c6b@oracle.com>
Message-ID: <87o8wc9x96.fsf@redhat.com>


> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/

That looks good to me.

Roland.


From christian.hagedorn at oracle.com  Fri Dec 13 13:23:53 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 13 Dec 2019 14:23:53 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <87o8wc9x96.fsf@redhat.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
 <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com>
 <87r2189ybx.fsf@redhat.com> <b4988542-3cdb-75a7-0567-e54821811c6b@oracle.com>
 <87o8wc9x96.fsf@redhat.com>
Message-ID: <69bc6af9-dffd-117e-c8b4-e87c9d5e2b75@oracle.com>

Thank you Roland for your review!

Best regards,
Christian

On 13.12.19 14:21, Roland Westrelin wrote:
> 
>> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/
> 
> That looks good to me.
> 
> Roland.
> 

From richard.reingruber at sap.com  Fri Dec 13 14:17:00 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Fri, 13 Dec 2019 14:17:00 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <01509185-7b0b-a269-deb1-799444cf082f@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <a690384a-2002-2d95-ad69-45fb66bc3452@oracle.com>
 <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com>
 <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com>
 <DB7PR02MB3612074D337EA1B024063A289B550@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com>
 <b71d94e4-abfb-5384-c6fd-1a302f8fd2f2@oracle.com>
 <01509185-7b0b-a269-deb1-799444cf082f@oracle.com>
Message-ID: <DB7PR02MB3612973C0B2C24420C02243B9B540@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hi David, Vladimir,

The tests are very targeted and customized towards the issues they solve. IMHO they should be run in
the configuration they are tailored for, but as I said, I'm ok with removing the tiered
options/conditions.

The enhancement should be covered also by existing JVMTI, JDI, JDWP tests, assuming they are also
executed with Xcomp.

If running the tests with Graal as C2 replacement you'll get failures, because the JVMCI compiler
does not provide the debug info required at runtime (see compiledVFrame::not_global_escape_in_scope()
and compiledVFrame::arg_escape). Still it would be possible to change the tests to expect these
failures when executed with Graal. Perhaps I should do this?

Thanks, Richard.

-----Original Message-----
From: David Holmes <david.holmes at oracle.com> 
Sent: Freitag, 13. Dezember 2019 02:53
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; hotspot-runtime-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; serviceability-dev at openjdk.java.net
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

On 13/12/2019 10:56 am, Vladimir Kozlov wrote:
> Yes, David
> 
> You are correct these changes touch all part of VM and may affect Graal 
> (which also has EA) too.
> Changes should be tested in all our modes: tiered, C1 only, Graal, 
> Interpreter. And I realized that I only ran tier3-graal testing so I 
> submitted the rest of Graal's tiers now.
> 
> I had assumed that our current testing (I ran all from tier1 to tier8) 
> should exercise all paths in VM these changes touch. But I may be wrong 
> and it is correct to ask author to add testing in all VM modes to make 
> sure new code in VM's runtime and JVMTI is tested.

It may be that our existing JVM TI tests will exercise this adequately 
and that the new tests are more "whitebox" testing than general 
functional tests. But it is not obvious to me that we do have the 
coverage we need.

Cheers,
David

> I do like to keep what current test is doing with C2. May be add an 
> other test for other modes or modify current one to enable to run it in 
> other modes.
> 
> Thanks,
> Vladimir
> 
> On 12/12/19 3:32 PM, David Holmes wrote:
>> On 13/12/2019 9:02 am, Reingruber, Richard wrote:
>>> Hello Vladimir,
>>>
>>> thanks for having a look.
>>>
>>> ?? > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to 
>>> skip
>>> ?? > test from running in Interpreter mode too.
>>>
>>> Done.
>>>
>>> ?? > You don't need vm.opt.TieredCompilation != true in @requires 
>>> because you
>>> ?? > specified -XX:-TieredCompilation in @run command.
>>>
>>> Ok.
>>>
>>> ?? > The test is specifically written for C2 only (not for C1 or 
>>> Graal) to
>>> ?? > verify its Escape Analysis optimization.
>>> ?? > I did not look in great details into test's code but its 
>>> analysis may be
>>> ?? > affected if C1 compiler is also used.
>>> ?? >
>>> ?? > Richard may clarify this.
>>>
>>> The test cases aim to get their testmethod 'dontinline_testMethod' 
>>> compiled by C2. If they get C1
>>> compiled before doesn't matter all that much. I've got a slight 
>>> preference to disabled tiered
>>> compilation for simplicity.
>>
>> My concern - perhaps unfounded - is that this seems to be being tested 
>> only in a pure C2 environment when the actual changes will have to 
>> operate correctly in a tiered environment (and JVMCI).
>>
>> Thanks,
>> David
>>
>>> Thanks, Richard.
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> Sent: Donnerstag, 12. Dezember 2019 19:20
>>> To: David Holmes <david.holmes at oracle.com>; 
>>> hotspot-runtime-dev at openjdk.java.net; 
>>> hotspot-compiler-dev at openjdk.java.net; 
>>> serviceability-dev at openjdk.java.net; Reingruber, Richard 
>>> <richard.reingruber at sap.com>
>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better 
>>> Performance in the Presence of JVMTI Agents
>>>
>>> Hi David,
>>>
>>> Tiered is disabled because we don't want to see compilations and outputs
>>> from C1 compiler which does not have EA.
>>>
>>> The test is specifically written for C2 only (not for C1 or Graal) to
>>> verify its Escape Analysis optimization.
>>> I did not look in great details into test's code but its analysis may be
>>> affected if C1 compiler is also used.
>>>
>>> Richard may clarify this.
>>>
>>> thanks,
>>> Vladimir
>>>
>>> On 12/11/19 1:04 PM, David Holmes wrote:
>>>> On 12/12/2019 5:21 am, Vladimir Kozlov wrote:
>>>>> I will do full review later. I want to comment about test command 
>>>>> line.
>>>>>
>>>>> You don't need vm.opt.TieredCompilation != true in @requires because
>>>>> you specified -XX:-TieredCompilation in @run command.
>>>>
>>>> And per my comment this should be being tested with tiered as well.
>>>>
>>>> David
>>>>
>>>>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip
>>>>> test from running in Interpreter mode too.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 12/11/19 7:07 AM, Reingruber, Richard wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> ??? > Most of the details here are in areas I can comment on in
>>>>>> detail, but I
>>>>>> ??? > did take an initial general look at things.
>>>>>>
>>>>>> Thanks for taking the time!
>>>>>>
>>>>>> ??? > The only thing that jumped out at me is that I think the
>>>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>> ??? >
>>>>>> ??? > +? bool is_hidden_from_external_view() const { return true; }
>>>>>>
>>>>>> Yes, it should. Will add the method like above.
>>>>>>
>>>>>> ??? > Also I don't see any testing of the 
>>>>>> DeoptimizeObjectsALotThread.
>>>>>> Without
>>>>>> ??? > active testing this will just bit-rot.
>>>>>>
>>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>>>> workload. I will add a minimal test
>>>>>> to keep it fresh.
>>>>>>
>>>>>> ??? > Also on the tests I don't understand your @requires clause:
>>>>>> ??? >
>>>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>>> ??? > (vm.opt.TieredCompilation != true))
>>>>>> ??? >
>>>>>> ??? > This seems to require that TieredCompilation is disabled, but
>>>>>> tiered is
>>>>>> ??? > our normal mode of operation. ??
>>>>>> ??? >
>>>>>>
>>>>>> I removed the clause. I guess I wanted to target the tests towards
>>>>>> the code they are supposed to
>>>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>>>> with just one compiler thread.
>>>>>>
>>>>>> Additionally I will make use of
>>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>>>> serviceability-dev at openjdk.java.net;
>>>>>> hotspot-compiler-dev at openjdk.java.net;
>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>>>> Performance in the Presence of JVMTI Agents
>>>>>>
>>>>>> Hi Richard,
>>>>>>
>>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I would like to get reviews please for
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>>>
>>>>>>> Corresponding RFE:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>>>
>>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>>>>
>>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
>>>>>>> issues (thanks!). In addition the
>>>>>>> change is being tested at SAP since I posted the first RFR some
>>>>>>> months ago.
>>>>>>>
>>>>>>> The intention of this enhancement is to benefit performance wise
>>>>>>> from escape analysis even if JVMTI
>>>>>>> agents request capabilities that allow them to access local variable
>>>>>>> values. E.g. if you start-up
>>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
>>>>>>> escape analysis is disabled right
>>>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>>>> should do so. With the
>>>>>>> enhancement, escape analysis will remain enabled until and after a
>>>>>>> debugger attaches. EA based
>>>>>>> optimizations are reverted just before an agent acquires the
>>>>>>> reference to an object. In the JBS item
>>>>>>> you'll find more details.
>>>>>>
>>>>>> Most of the details here are in areas I can comment on in detail, 
>>>>>> but I
>>>>>> did take an initial general look at things.
>>>>>>
>>>>>> The only thing that jumped out at me is that I think the
>>>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>>
>>>>>> +? bool is_hidden_from_external_view() const { return true; }
>>>>>>
>>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. 
>>>>>> Without
>>>>>> active testing this will just bit-rot.
>>>>>>
>>>>>> Also on the tests I don't understand your @requires clause:
>>>>>>
>>>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>>> (vm.opt.TieredCompilation != true))
>>>>>>
>>>>>> This seems to require that TieredCompilation is disabled, but 
>>>>>> tiered is
>>>>>> our normal mode of operation. ??
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch 
>>>>>>>
>>>>>>>
>>>>>>>

From christian.hagedorn at oracle.com  Fri Dec 13 15:20:17 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 13 Dec 2019 16:20:17 +0100
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8325@dggeml527-mbx.china.huawei.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
 <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD8325@dggeml527-mbx.china.huawei.com>
Message-ID: <f9277177-e8ae-e9b4-950f-4d60555aa20d@oracle.com>

Hi Felix

Thanks for explaining. Following your analysis with the provided test 
case, orig_msize is 0 in the end. Can you also provide a test case or 
show an example which covers the else case in this test:

  717             if (orig_msize == 0) {
  718               best_align_to_mem_ref = memops.at(max_idx)->as_Mem();
  719             } else {
  720               for (uint i = 0; i < orig_msize; i++) {
  721                 memops.remove(0);
  722               }
  723               best_align_to_mem_ref = find_align_to_ref(memops, 
max_idx);
  724               assert(best_align_to_mem_ref == NULL, "sanity");
  725               best_align_to_mem_ref = memops.at(max_idx)->as_Mem();
  726             }

Best regards,
Christian


On 12.12.19 07:24, Yangfei (Felix) wrote:
> Hi,
> 
>    I have created a webrev for the patch: http://cr.openjdk.java.net/~fyang/8235762/webrev.00/
>    Tested tier1-3 with both aarch64 and x86_64 linux release build.
>    Newly added test case fail without the patch and pass with the patch.
> 
> Thanks,
> Felix
> 
>> -----Original Message-----
>> From: Yangfei (Felix)
>> Sent: Wednesday, December 11, 2019 11:12 PM
>> To: 'Christian Hagedorn' <christian.hagedorn at oracle.com>; Tobias Hartmann
>> <tobias.hartmann at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: RE: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
>>
>> Hi Christian,
>>
>> Thanks for the suggestions. Comments inlined.
>>
>>> -----Original Message-----
>>> From: Christian Hagedorn [mailto:christian.hagedorn at oracle.com]
>>> Sent: Wednesday, December 11, 2019 10:54 PM
>>> To: Yangfei (Felix) <felix.yang at huawei.com>; Tobias Hartmann
>>> <tobias.hartmann at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR(S): 8235762: JVM crash in SWPointer during C2
>>> compilation
>>>
>>> Hi Felix
>>>
>>> Thanks for working on that. Your fix also seems to work for JDK-8235700.
>>> I closed that one as a duplicate of yours.
>>>
>>>>>> +              for (int i = 0; i < orig_msize; i++) {
>>>
>>> Should be uint since orig_msize is a uint
>>
>> -- Yes, will modify accordingly when I am preparing webrev.
>>
>>>
>>>>>> +              best_align_to_mem_ref = find_align_to_ref(memops,
>>>>> max_idx);
>>>>>> +              assert(best_align_to_mem_ref == NULL, "sanity");
>>>
>>> You can merge these two lines together into
>>> assert(find_align_to_ref(memops,
>>> max_idx) == NULL, "sanity"); since the call belongs to the sanity
>>> check. Or just surround it by a #ifdef ASSERT.
>>
>> -- The purpose of line 721 here is to calculate the max_idx.
>>    So I don't think it's suitable to treat this line as assertion logic.
>>
>>>>>> +  idx = max_idx;
>>>
>>> Is max_idx always guaranteed to be valid and not -1 when accessing it later?
>>
>> -- Yes, I think so.
>>    When memops is not empty and the memory ops in memops are not
>> comparable, find_align_to_ref will always sets its max_idx.
>>
>> Thanks,
>> Felix

From zhuoren.wz at alibaba-inc.com  Fri Dec 13 02:24:41 2019
From: zhuoren.wz at alibaba-inc.com (=?UTF-8?B?V2FuZyBaaHVvKFpodW9yZW4p?=)
Date: Fri, 13 Dec 2019 10:24:41 +0800
Subject: =?UTF-8?B?UmU6IFthYXJjaDY0LXBvcnQtZGV2IF0gUkZSOiA4MjM1Mzg1OiBBQXJjaDY0OiBDcmFzaCBv?=
 =?UTF-8?B?biBhYXJjaDY0IEpESyBkdWUJdG8gbG9uZyBvZmZzZXQ=?=
In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
Message-ID: <9448390e-8a43-485a-9060-ab3b7f4250fa.zhuoren.wz@alibaba-inc.com>

OK for me.
I also found this assertion failure also existed in load. A test for load was uploaded in JBS page.

Regards,
Zhuoren


------------------------------------------------------------------
From:Andrew Haley <aph at redhat.com>
Sent At:2019 Dec. 13 (Fri.) 03:15
To:hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net <aarch64-port-dev at openjdk.java.net>
Subject:[aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset

This assertion failure happens because load/store patterns in
aarch64.ad are incorrect.

The operands immIOffset and operand immLoffset() are only correct for
load/store *byte* offsets. Offsets of sizes greater than a byte should
be shifted by the operand size, and misaligned offsets are only
allowed for a small range. We get this wrong, so we try to use
misaligned byte addresses for sizes greater than byte. This fails
at compile time.

We've never noticed this before because Java code doesn't generate
misaligned offsets, so we can only test this fix by using Unsafe (or a
customer of Unsafe such as ByteBuffer.)

Wang Zhuo(Zhuoren) <zhuoren.wz at alibaba-inc.com> wrote a patch for this
bug but it was incomplete; the problem reached deeper than we at first
realized. Zhuoren's approach was to fix up the code generation after
pattern matching, but this masks an efficiency problem. In many cases
where we could use offset addresses, we don't because the pattern
matcher incorrectly decides offsets are out of range.

This patch fixes both problems by using memory types that are correct
for all operand sizes. These are memory1, memory2, memory4, etc; this
naming fits in with the existing types used by vectors.

It does lead to a rather large patch, but it's not quite as bad as it
looks because much of the code is auto-generated by the script
ad_encode.m4. Unfortunately, in the time since it was written some
developers have edited that auto-generated section of aarch64.ad by
hand, so I had to move these hand edits out of the way. I also

I have also added big scary

  // DO NOT EDIT ANYTHING IN THIS SECTION OF THE FILE

comments so this doesn't happen again.

In a rather belt-and-braces way I've also added some code that fixes
up illegal addresses. I did consider removing it, but I left it in
because it doesn't hurt. If we ever do generate similar illegal
addresses, debug builds will assert. I'm not sure whether to keep this
or not.

Andrew Dinn will probably have a cow when he sees this patch. :-)

OK for HEAD?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From jatin.bhateja at intel.com  Fri Dec 13 17:22:54 2019
From: jatin.bhateja at intel.com (Bhateja, Jatin)
Date: Fri, 13 Dec 2019 17:22:54 +0000
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
Message-ID: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>

Hi All,

Please find below a link to the patch

JBS          : https://bugs.openjdk.java.net/browse/JDK-8230185
WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/

Here first level compilation for the method was done by C1 compiler since -Xcomp and -XX:+TieredCompilation (default) options were used, as the back-edge
taken count went beyond threshold for Inner-Loop, OSR compilation request was issued to C2 compiler.

Ideal construction  begins from the hot loop header block and follows the control flows exposed by the ciTypeFlow model (CFG created over raw bytecodes) keeping branch probabilities into consideration. Loop detection also begins from the hot loop header and does a DFS walk till it encounters a backedge,  newly detected loop containing the mul-add graph pattern in this case is irreducible and not a natural counted loop which is a must requirement for vector VNNI pattern detection. Adding a missing check for counted loop to prevent crash here.

Kindly review.

Regards,
Jatin

From richard.reingruber at sap.com  Fri Dec 13 19:01:57 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Fri, 13 Dec 2019 19:01:57 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
Message-ID: <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hi David,

  > Some further queries/concerns:
  > 
  > src/hotspot/share/runtime/objectMonitor.cpp
  > 
  > Can you please explain the changes to ObjectMonitor::wait:
  > 
  > !   _recursions = save      // restore the old recursion count
  > !                 + jt->get_and_reset_relock_count_after_wait(); //
  > increased by the deferred relock count
  > 
  > what is the "deferred relock count"? I gather it relates to
  > 
  > "The code was extended to be able to deoptimize objects of a frame that
  > is not the top frame and to let another thread than the owning thread do
  > it."

Yes, these relate. Currently EA based optimizations are reverted, when a compiled frame is replaced
with corresponding interpreter frames. Part of this is relocking objects with eliminated
locking. New with the enhancement is that we do this also just before object references are acquired
through JVMTI. In this case we deoptimize also the owning compiled frame C and we register
deoptimized objects as deferred updates. When control returns to C it gets deoptimized, we notice
that objects are already deoptimized (reallocated and relocked), so we don't do it again (relocking
twice would be incorrect of course). Deferred updates are copied into the new interpreter frames.

Problem: relocking is not possible if the target thread T is waiting on the monitor that needs to be
relocked. This happens only with non-local objects with EliminateNestedLocks. Instead relocking is
deferred until T owns the monitor again. This is what the piece of code above does.

  > which I don't like the sound of at all when it comes to ObjectMonitor
  > state. So I'd like to understand in detail exactly what is going on here
  > and why.  This is a very intrusive change that seems to badly break
  > encapsulation and impacts future changes to ObjectMonitor that are under
  > investigation.

I would not regard this as breaking encapsulation. Certainly not badly.

I've added a property relock_count_after_wait to JavaThread. The property is well
encapsulated. Future ObjectMonitor implementations have to deal with recursion too. They are free in
choosing a way to do that as long as that property is taken into account. This is hardly a
limitation.

Note also that the property is a straight forward extension of the existing concept of deferred
local updates. It is embedded into the structure holding them. So not even the footprint of a
JavaThread is enlarged if no deferred updates are generated.

  > ---
  > 
  > src/hotspot/share/runtime/thread.cpp
  > 
  > Can you please explain why JavaThread::wait_for_object_deoptimization
  > has to be handcrafted in this way rather than using proper transitions.
  > 

I wrote wait_for_object_deoptimization taking JavaThread::java_suspend_self_with_safepoint_check
as template. So in short: for the same reasons :)

Threads reach both methods as part of thread state transitions, therefore special handling is
required to change thread state on top of ongoing transitions.

  > We got rid of "deopt suspend" some time ago and it is disturbing to see
  > it being added back (effectively). This seems like it may be something
  > that handshakes could be used for.

Deopt suspend used to be something rather different with a similar name[1]. It is not being added back.

I'm actually duplicating the existing external suspend mechanism, because a thread can be suspended
at most once. And hey, and don't like that either! But it seems not unlikely that the duplicate can
be removed together with the original and the new type of handshakes that will be used for
thread suspend can be used for object deoptimization too. See today's discussion in JDK-8227745 [2].

Thanks, Richard.

[1] Deopt suspend was something like an async. handshake for architectures with register windows,
    where patching the return pc for deoptimization of a compiled frame was racy if the owner thread
    was in native code. Instead a "deopt" suspend flag was set on which the thread patched its own
    frame upon return from native. So no thread was suspended. It got its name only from the name of
    the flags.

[2] Discussion about using handshakes to sync. with the target thread:
    https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727

-----Original Message-----
From: David Holmes <david.holmes at oracle.com> 
Sent: Freitag, 13. Dezember 2019 00:56
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard,

Some further queries/concerns:

src/hotspot/share/runtime/objectMonitor.cpp

Can you please explain the changes to ObjectMonitor::wait:

!   _recursions = save      // restore the old recursion count
!                 + jt->get_and_reset_relock_count_after_wait(); // 
increased by the deferred relock count

what is the "deferred relock count"? I gather it relates to

"The code was extended to be able to deoptimize objects of a frame that 
is not the top frame and to let another thread than the owning thread do 
it."

which I don't like the sound of at all when it comes to ObjectMonitor 
state. So I'd like to understand in detail exactly what is going on here 
and why.  This is a very intrusive change that seems to badly break 
encapsulation and impacts future changes to ObjectMonitor that are under 
investigation.

---

src/hotspot/share/runtime/thread.cpp

Can you please explain why JavaThread::wait_for_object_deoptimization 
has to be handcrafted in this way rather than using proper transitions.

We got rid of "deopt suspend" some time ago and it is disturbing to see 
it being added back (effectively). This seems like it may be something 
that handshakes could be used for.

Thanks,
David
-----

On 12/12/2019 7:02 am, David Holmes wrote:
> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
>> Hi David,
>>
>> ?? > Most of the details here are in areas I can comment on in detail, 
>> but I
>> ?? > did take an initial general look at things.
>>
>> Thanks for taking the time!
> 
> Apologies the above should read:
> 
> "Most of the details here are in areas I *can't* comment on in detail ..."
> 
> David
> 
>> ?? > The only thing that jumped out at me is that I think the
>> ?? > DeoptimizeObjectsALotThread should be a hidden thread.
>> ?? >
>> ?? > +? bool is_hidden_from_external_view() const { return true; }
>>
>> Yes, it should. Will add the method like above.
>>
>> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. 
>> Without
>> ?? > active testing this will just bit-rot.
>>
>> DeoptimizeObjectsALot is meant for stress testing with a larger 
>> workload. I will add a minimal test
>> to keep it fresh.
>>
>> ?? > Also on the tests I don't understand your @requires clause:
>> ?? >
>> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>> ?? > (vm.opt.TieredCompilation != true))
>> ?? >
>> ?? > This seems to require that TieredCompilation is disabled, but 
>> tiered is
>> ?? > our normal mode of operation. ??
>> ?? >
>>
>> I removed the clause. I guess I wanted to target the tests towards the 
>> code they are supposed to
>> test, and it's easier to analyze failures w/o tiered compilation and 
>> with just one compiler thread.
>>
>> Additionally I will make use of 
>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>
>> Thanks,
>> Richard.
>>
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Mittwoch, 11. Dezember 2019 08:03
>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>> serviceability-dev at openjdk.java.net; 
>> hotspot-compiler-dev at openjdk.java.net; 
>> hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better 
>> Performance in the Presence of JVMTI Agents
>>
>> Hi Richard,
>>
>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>> Hi,
>>>
>>> I would like to get reviews please for
>>>
>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>
>>> Corresponding RFE:
>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>
>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>
>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without 
>>> issues (thanks!). In addition the
>>> change is being tested at SAP since I posted the first RFR some 
>>> months ago.
>>>
>>> The intention of this enhancement is to benefit performance wise from 
>>> escape analysis even if JVMTI
>>> agents request capabilities that allow them to access local variable 
>>> values. E.g. if you start-up
>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then 
>>> escape analysis is disabled right
>>> from the beginning, well before a debugger attaches -- if ever one 
>>> should do so. With the
>>> enhancement, escape analysis will remain enabled until and after a 
>>> debugger attaches. EA based
>>> optimizations are reverted just before an agent acquires the 
>>> reference to an object. In the JBS item
>>> you'll find more details.
>>
>> Most of the details here are in areas I can comment on in detail, but I
>> did take an initial general look at things.
>>
>> The only thing that jumped out at me is that I think the
>> DeoptimizeObjectsALotThread should be a hidden thread.
>>
>> +? bool is_hidden_from_external_view() const { return true; }
>>
>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
>> active testing this will just bit-rot.
>>
>> Also on the tests I don't understand your @requires clause:
>>
>> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>> (vm.opt.TieredCompilation != true))
>>
>> This seems to require that TieredCompilation is disabled, but tiered is
>> our normal mode of operation. ??
>>
>> Thanks,
>> David
>>
>>> Thanks,
>>> Richard.
>>>
>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>       
>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch 
>>>
>>>

From vladimir.kozlov at oracle.com  Fri Dec 13 19:42:31 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 13 Dec 2019 11:42:31 -0800
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
In-Reply-To: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
Message-ID: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>

Hi Jatin

Yes, this fix is correct. But you added trailing spaces to modified lines. Please, fix it.

Thanks,
Vladimir

On 12/13/19 9:22 AM, Bhateja, Jatin wrote:
> Hi All,
> 
> Please find below a link to the patch
> 
> JBS          : https://bugs.openjdk.java.net/browse/JDK-8230185
> WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/
> 
> Here first level compilation for the method was done by C1 compiler since -Xcomp and -XX:+TieredCompilation (default) options were used, as the back-edge
> taken count went beyond threshold for Inner-Loop, OSR compilation request was issued to C2 compiler.
> 
> Ideal construction  begins from the hot loop header block and follows the control flows exposed by the ciTypeFlow model (CFG created over raw bytecodes) keeping branch probabilities into consideration. Loop detection also begins from the hot loop header and does a DFS walk till it encounters a backedge,  newly detected loop containing the mul-add graph pattern in this case is irreducible and not a natural counted loop which is a must requirement for vector VNNI pattern detection. Adding a missing check for counted loop to prevent crash here.
> 
> Kindly review.
> 
> Regards,
> Jatin
> 

From vladimir.kozlov at oracle.com  Fri Dec 13 22:18:44 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 13 Dec 2019 14:18:44 -0800
Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate
 nodes
In-Reply-To: <b5913422-d9b6-5e13-e1c9-15be352e317a@oracle.com>
References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>
 <d44e28db-19d9-b00b-8a53-40e7e934d4be@oracle.com>
 <b5913422-d9b6-5e13-e1c9-15be352e317a@oracle.com>
Message-ID: <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com>

On 12/13/19 2:27 AM, Vladimir Ivanov wrote:
> Thanks for the feedback, Vladimir.
> 
>> replicateB
>>
>> Can you fold it differently?
>>
>> ReplB_reg_leg
> 
> Are you talking about ReplB_reg?

Yes, I was talking about ReplB_reg. I thought we can combine all [8-64] length vectors but I missed that ReplB_reg_leg 
uses legVec and needs separate instructions :(

And I wanted to separate instruction which use avx 512 (evpbroadcastb) because it is difficult to see relation between 
predicate condition (length() == 64 && VM_Version::supports_avx512bw()) and check in code (vlen == 64 || 
VM_Version::supports_avx512vlbw()). First, || vs &&. Second, avx512bw vs avx512vlbw. May be better to have a separate 
instruction for this.

> 
>> ?? predicate(!VM_Version::supports_avx512vlbw());
> 
> 3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{
> 3152?? predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw());
> 
> For ReplB_reg_leg the predicate can't be simplified: it is applicable only to 512bit vector when AVX512BW is absent. 
> Otherwise, legVec constraint will be unnecessarily applied to other configurations.

That is why you replaced !avx512vlbw with !avx512bw?
May be this section of code need comment which explains why one or an other is used.

> 
> 3119 instruct ReplB_reg(vec dst, rRegI src) %{
> 3120?? predicate((n->as_Vector()->length() <= 32) ||
> 3121???????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));
> 
> For ReplB_reg there's a shorter version:
> 
> predicate(n->as_Vector()->length() <= 32 || VM_Version::supports_avx512bw());
> 
> 
> But do you find it easier to read? For example, when you are checking that all configurations are covered:
> 
> predicate(n->as_Vector()->length() <= 32 ||
>  ????????? VM_Version::supports_avx512bw());
> 
> instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>  ? predicate(n->as_Vector()->length() == 64 &&
>  ?????????? !VM_Version::supports_avx512bw());
> 
> vs
> 
> instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>  ? predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw());
> 
> instruct ReplB_reg(vec dst, rRegI src) %{
>  ? predicate((n->as_Vector()->length() <= 32) ||
>  ??????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));

I think next conditions in predicates require comment about using avx512vlbw and avx512bw.

!   predicate((n->as_Vector()->length() <= 32 && VM_Version::supports_avx512vlbw()) ||
!             (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));

> 
> 
>> ?? ins_encode %{
>> ???? uint vlen = vector_length(this);
>> ???? __ movdl($dst$$XMMRegister, $src$$Register);
>> ???? __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister);
>> ???? __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00);
>> ???? if (vlen > 8) {
>> ?????? __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister);
>> ?????? if (vlen > 16) {
>> ???????? __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister);
>> ???????? if (vlen > 32) {
>> ?????????? assert(vlen == 64, "sanity");
>> ?????????? __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, $dst$$XMMRegister, 0x1);
> 
> Yes, it should work as well. Do you find it easier to read though?

Code is smaller.

> 
>> Similar ReplB_imm_leg for which I don't see new implementation.
> 
> Good catch. Added it back.
> 
> (FTR completeness for reg2reg variants (_reg*) is mandatory. But for _mem and _imm it is optional: if they some 
> configuration isn't covered, _reg is used. But it was in original code, so I added it back.)

I don't see code which was in Repl4B_imm() and Repl8B_imm() (only movdl and movq without vpbroadcastb).

> 
> Updated version:
>  ? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00

Should be webrev.01

> 
> Another thing I noticed is that for ReplI/.../ReplD cases avx512vl checks are not necessary:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/
> 
> The code assumes that evpbroadcastd/evpbroadcastq, vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need AVX512VL 
> for 512bit case, but Intel manual says AVX512F is enough.
> 
> I plan to handle it as a separate change, but let me know if you want to incorporate it into 8235825.

Yes, lets do it separately.

> 
>> It should also simplify code for avx512 which one or 2 instructions.
> 
> Can you elaborate, please? Are you talking about the case when version for different vector sizes differ in 1-2 
> instructions (like ReplB_reg)?
No, I was talking about cases when evpbroadcastb and vpbroadcastb instructions are used. I was think to have them in 
separate instructions. In your latest version it would be only evpbroadcastb case from ReplB_reg().

Thanks,
Vladimir

> 
>> Other types changes can be done same way.
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 12/12/19 3:19 AM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/
>>> https://bugs.openjdk.java.net/browse/JDK-8235825
>>>
>>> Merge AD instructions for the following vector nodes:
>>> ?? - ReplicateB, ..., ReplicateD
>>>
>>> Individual patches:
>>>
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual
>>>
>>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX)
>>>
>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>> Reviewed-by: vlivanov, sviswanathan, ?
>>>
>>> Best regards,
>>> Vladimir Ivanov

From jatin.bhateja at intel.com  Sun Dec 15 08:41:58 2019
From: jatin.bhateja at intel.com (Bhateja, Jatin)
Date: Sun, 15 Dec 2019 08:41:58 +0000
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
In-Reply-To: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
 <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>
Message-ID: <A66BBE673E08E1428E3A918AE4D5B32C0A0D9641@BGSMSX106.gar.corp.intel.com>

Hi Vladimir,

Updated patch is placed at following link.

http://cr.openjdk.java.net/~jbhateja/8230185/webrev.01/

Kindly also push this to the repository.

Regards,
Jatin

> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
> Sent: Saturday, December 14, 2019 1:13 AM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
> 
> Hi Jatin
> 
> Yes, this fix is correct. But you added trailing spaces to modified lines. Please,
> fix it.
> 
> Thanks,
> Vladimir
> 
> On 12/13/19 9:22 AM, Bhateja, Jatin wrote:
> > Hi All,
> >
> > Please find below a link to the patch
> >
> > JBS          : https://bugs.openjdk.java.net/browse/JDK-8230185
> > WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/
> >
> > Here first level compilation for the method was done by C1 compiler
> > since -Xcomp and -XX:+TieredCompilation (default) options were used, as the
> back-edge taken count went beyond threshold for Inner-Loop, OSR
> compilation request was issued to C2 compiler.
> >
> > Ideal construction  begins from the hot loop header block and follows the
> control flows exposed by the ciTypeFlow model (CFG created over raw
> bytecodes) keeping branch probabilities into consideration. Loop detection
> also begins from the hot loop header and does a DFS walk till it encounters a
> backedge,  newly detected loop containing the mul-add graph pattern in this
> case is irreducible and not a natural counted loop which is a must requirement
> for vector VNNI pattern detection. Adding a missing check for counted loop to
> prevent crash here.
> >
> > Kindly review.
> >
> > Regards,
> > Jatin
> >

From felix.yang at huawei.com  Mon Dec 16 02:47:42 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Mon, 16 Dec 2019 02:47:42 +0000
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <f9277177-e8ae-e9b4-950f-4d60555aa20d@oracle.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
 <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD8325@dggeml527-mbx.china.huawei.com>
 <f9277177-e8ae-e9b4-950f-4d60555aa20d@oracle.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EE3DD87D3@dggeml527-mbx.china.huawei.com>

Hi Christian,

  Yes, orig_msize is 0 for the test case in my webrev.  
  For the else case, I find it hard to manually create a test case for it.  
  Another choice is asserting that this else case never happens.  I am not going that way as I haven't got a strong reason for that.  
  

Thank,
Felix

> 
> Hi Felix
> 
> Thanks for explaining. Following your analysis with the provided test case,
> orig_msize is 0 in the end. Can you also provide a test case or show an example
> which covers the else case in this test:
> 
>   717             if (orig_msize == 0) {
>   718               best_align_to_mem_ref =
> memops.at(max_idx)->as_Mem();
>   719             } else {
>   720               for (uint i = 0; i < orig_msize; i++) {
>   721                 memops.remove(0);
>   722               }
>   723               best_align_to_mem_ref = find_align_to_ref(memops,
> max_idx);
>   724               assert(best_align_to_mem_ref == NULL, "sanity");
>   725               best_align_to_mem_ref =
> memops.at(max_idx)->as_Mem();
>   726             }

From tobias.hartmann at oracle.com  Mon Dec 16 07:34:39 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 16 Dec 2019 08:34:39 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <69bc6af9-dffd-117e-c8b4-e87c9d5e2b75@oracle.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
 <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com>
 <87r2189ybx.fsf@redhat.com> <b4988542-3cdb-75a7-0567-e54821811c6b@oracle.com>
 <87o8wc9x96.fsf@redhat.com> <69bc6af9-dffd-117e-c8b4-e87c9d5e2b75@oracle.com>
Message-ID: <856c26c5-c996-3d4c-cc6b-14db718c78b2@oracle.com>

Hi Christian,

On 13.12.19 14:23, Christian Hagedorn wrote:
>> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/ 

This looks good to me. Just noticed some excess whitespace in ciMethodData.cpp:163 ")  / HeapWordSize".

Please run some quick sanity performance testing before pushing.

Best regards,
Tobias

From tobias.hartmann at oracle.com  Mon Dec 16 07:40:36 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 16 Dec 2019 08:40:36 +0100
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
In-Reply-To: <A66BBE673E08E1428E3A918AE4D5B32C0A0D9641@BGSMSX106.gar.corp.intel.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
 <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0D9641@BGSMSX106.gar.corp.intel.com>
Message-ID: <f9b62995-48a8-1dcf-468e-a0c53d3ee83c@oracle.com>

Hi Jatin,

this looks good to me too but could you please add a regression test (for example, simplified
version of the JavaFuzzer generated test)?

Also, please set the bug to "In Progress".

Thanks,
Tobias

On 15.12.19 09:41, Bhateja, Jatin wrote:
> Hi Vladimir,
> 
> Updated patch is placed at following link.
> 
> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.01/
> 
> Kindly also push this to the repository.
> 
> Regards,
> Jatin
> 
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
>> Sent: Saturday, December 14, 2019 1:13 AM
>> To: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
>>
>> Hi Jatin
>>
>> Yes, this fix is correct. But you added trailing spaces to modified lines. Please,
>> fix it.
>>
>> Thanks,
>> Vladimir
>>
>> On 12/13/19 9:22 AM, Bhateja, Jatin wrote:
>>> Hi All,
>>>
>>> Please find below a link to the patch
>>>
>>> JBS          : https://bugs.openjdk.java.net/browse/JDK-8230185
>>> WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/
>>>
>>> Here first level compilation for the method was done by C1 compiler
>>> since -Xcomp and -XX:+TieredCompilation (default) options were used, as the
>> back-edge taken count went beyond threshold for Inner-Loop, OSR
>> compilation request was issued to C2 compiler.
>>>
>>> Ideal construction  begins from the hot loop header block and follows the
>> control flows exposed by the ciTypeFlow model (CFG created over raw
>> bytecodes) keeping branch probabilities into consideration. Loop detection
>> also begins from the hot loop header and does a DFS walk till it encounters a
>> backedge,  newly detected loop containing the mul-add graph pattern in this
>> case is irreducible and not a natural counted loop which is a must requirement
>> for vector VNNI pattern detection. Adding a missing check for counted loop to
>> prevent crash here.
>>>
>>> Kindly review.
>>>
>>> Regards,
>>> Jatin
>>>

From christian.hagedorn at oracle.com  Mon Dec 16 07:58:07 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Mon, 16 Dec 2019 08:58:07 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
In-Reply-To: <856c26c5-c996-3d4c-cc6b-14db718c78b2@oracle.com>
References: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>
 <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com>
 <87r2189ybx.fsf@redhat.com> <b4988542-3cdb-75a7-0567-e54821811c6b@oracle.com>
 <87o8wc9x96.fsf@redhat.com> <69bc6af9-dffd-117e-c8b4-e87c9d5e2b75@oracle.com>
 <856c26c5-c996-3d4c-cc6b-14db718c78b2@oracle.com>
Message-ID: <ade54845-ade1-a164-7e26-b61d3b874fd6@oracle.com>

Thank you for your review Tobias!

On 16.12.19 08:34, Tobias Hartmann wrote:
> Hi Christian,
> 
> On 13.12.19 14:23, Christian Hagedorn wrote:
>>> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/
> 
> This looks good to me. Just noticed some excess whitespace in ciMethodData.cpp:163 ")  / HeapWordSize".

Thanks, fixed in current webrev.

> Please run some quick sanity performance testing before pushing.

Ran some performance tests over the weekend. It looks good.

Best regards,
Christian

From rwestrel at redhat.com  Mon Dec 16 08:18:45 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 16 Dec 2019 09:18:45 +0100
Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops
Message-ID: <87o8w8ptsq.fsf@redhat.com>


http://cr.openjdk.java.net/~roland/8231291/webrev.01/

As discussed before:

https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/035094.html
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036171.html

Fully unrolling loops early helps EA. The change to cfgnode.cpp is
required because full unroll sometimes needs peeling which may add a phi
between a memory access and its AddP, a pattern that EA doesn't
recognize.

Roland.


From tobias.hartmann at oracle.com  Mon Dec 16 08:39:16 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 16 Dec 2019 09:39:16 +0100
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8325@dggeml527-mbx.china.huawei.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
 <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD8325@dggeml527-mbx.china.huawei.com>
Message-ID: <d429870d-248e-bbb6-d296-d373c7e5affd@oracle.com>

Hi Felix,

On 12.12.19 07:24, Yangfei (Felix) wrote:
>   I have created a webrev for the patch: http://cr.openjdk.java.net/~fyang/8235762/webrev.00/  
>   Tested tier1-3 with both aarch64 and x86_64 linux release build.  
>   Newly added test case fail without the patch and pass with the patch.  

Thanks for creating a webrev and adding the test. I have some questions:
- Shouldn't we at least add an assert to verify that after SuperWord::find_adjacent_refs()
best_align_to_mem_ref != NULL if _packset.length() > 0?
- Why do you need max_idx? Isn't is the case that if find_align_to_ref returns NULL, there aren't
any comparable memory operations left and it essentially doesn't matter which one you chose? Also,
is it guaranteed that max_idx is always initialized in SuperWord::find_align_to_ref?

Thanks,
Tobias

From felix.yang at huawei.com  Mon Dec 16 10:14:24 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Mon, 16 Dec 2019 10:14:24 +0000
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <d429870d-248e-bbb6-d296-d373c7e5affd@oracle.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
 <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD8325@dggeml527-mbx.china.huawei.com>
 <d429870d-248e-bbb6-d296-d373c7e5affd@oracle.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EE3DD88F1@dggeml527-mbx.china.huawei.com>

Hi Tobias,

  Thanks for reviewing this.  

> 
> Hi Felix,
> 
> On 12.12.19 07:24, Yangfei (Felix) wrote:
> >   I have created a webrev for the patch:
> http://cr.openjdk.java.net/~fyang/8235762/webrev.00/
> >   Tested tier1-3 with both aarch64 and x86_64 linux release build.
> >   Newly added test case fail without the patch and pass with the patch.
> 
> Thanks for creating a webrev and adding the test. I have some questions:
> - Shouldn't we at least add an assert to verify that after
> SuperWord::find_adjacent_refs() best_align_to_mem_ref != NULL if
> _packset.length() > 0?

-- OK.  I will add one assertion immediately before breaking the loop.  


> - Why do you need max_idx? Isn't is the case that if find_align_to_ref returns
> NULL, there aren't any comparable memory operations left and it essentially
> doesn't matter which one you chose?

-- I was considering minimizing the performance impact of the patch here.  
  best_align_to_mem_ref is used in SuperWord::align_initial_loop_index for adjusting the pre-loop limit.  
  For the test case in the webrev, best_align_to_mem_ref was chosen from node 470 (StoreB) and node 431 (StoreL).  
  The vector width for these two memory operations are different on aarch64 platform: vw = 16 bytes for node 431 and 2 bytes for node 470.  
  SuperWord::align_initial_loop_index will emit different code sequences for the test case.  
  The max_idx tells us which memory operation has the biggest vector size.  


> Also, is it guaranteed that max_idx is
> always initialized in SuperWord::find_align_to_ref?

-- Yes, I think so. 
  When memops is not empty and the memory ops in memops are not comparable, find_align_to_ref will always sets its max_idx.  


From robbin.ehn at oracle.com  Mon Dec 16 10:20:39 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 16 Dec 2019 11:20:39 +0100
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
Message-ID: <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com>

Hi Richard, as mentioned it would be better if you could do this with
handshakes, instead of using _suspend_flag (since they are going away).
But I can't think of a way doing it without blocking safepoints, so we need to
add some more features in handshakes first.
When possible I hope you are willing to move this code to handshakes instead.

You could stop one thread with, e.g.:
class EscapeBarrierSuspendHandshake : public HandshakeClosure {
   Semaphore _is_waiting;
   Semaphore _wait;
   bool _started;
  public:
   EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), 
_wait(0), _started(false) { }
   void do_thread(Thread* th) {
     _is_waiting.signal();
     _wait.wait();
     Atomic::store(&_started, true);
   }
   void wait_until_eb_stopped() { _is_waiting.wait(); }
   void start_thread() {
     _wait.signal();
     while(!Atomic::load(&_started)) {
       os::naked_yield();
     }
   }
};

But it would block safepoints.

Thanks, Robbin

On 12/10/19 10:45 PM, Reingruber, Richard wrote:
> Hi,
> 
> I would like to get reviews please for
> 
> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
> 
> Corresponding RFE:
> https://bugs.openjdk.java.net/browse/JDK-8227745
> 
> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
> 
> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
> change is being tested at SAP since I posted the first RFR some months ago.
> 
> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
> agents request capabilities that allow them to access local variable values. E.g. if you start-up
> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
> from the beginning, well before a debugger attaches -- if ever one should do so. With the
> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
> you'll find more details.
> 
> Thanks,
> Richard.
> 
> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>      http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
> 

From goetz.lindenmaier at sap.com  Mon Dec 16 13:38:59 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 16 Dec 2019 13:38:59 +0000
Subject: RFR(M): 8235988: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
Message-ID: <AM6PR02MB534724A325213CB920E5DE13EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi,

PrintInlining and TraceLoopPredicate allocate stringStreams with new and
relied on the fact that all memory used is on the ResourceArea cleaned
after the compilation.

Since 8224193 the char* of the stringStream is malloced and thus
must be freed. No doing so manifests a memory leak.
This is only relevant if the corresponding tracing is active.

To fix TraceLoopPredicate I added the destructor call.
Fixing PrintInlining is a bit more complicated, as it uses several
stringStreams. A row of them is in a GrowableArray which must
be walked to free all of them.
As the GrowableArray is on an arena no destructor is called for it.

I also changed some as_string() calls to base() calls which reduced
memory need of the traces, and added a comment explaining the
constructor of GrowableArray that calls the copyconstructor for its
elements.

Please review:
http://cr.openjdk.java.net/~goetz/wr19/8235988-c2_tracing_mem_leak/01/

Best regards,
  Goetz.


From richard.reingruber at sap.com  Mon Dec 16 13:41:49 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Mon, 16 Dec 2019 13:41:49 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com>
Message-ID: <DB7PR02MB3612D26A0522C4B17B924E9A9B510@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hi Robbin,

first of all: thanks a lot for providing feedback. I do appreciate it.

I am absolutely willing to move this to handshakes. Only I still can't see how to achieve it.

Could you explain the drafted class EscapeBarrierSuspendHandshake a little bit? [1]

I'd like to look at it by example of JvmtiEnv::GetOwnedMonitorStackDepthInfo() where calling_thread
T1 would apply it on another thread T2.

1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore?

2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking
   T2, because either T2 or the VMThread will block in L10.

I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could
continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating
objects triggers GC or attempting to execute the vm operation in
JvmtiEnv::GetOwnedMonitorStackDepthInfo().

It might be impossible to replace my suspend flag with handshakes that are available today, because
if it was you could replace all the suspend flags right away, couldn't you?

Or I'm simply missing something... quite possible... :)

Thanks, Richard.

[1] Drafted by Robbin (thanks!)

     1	class EscapeBarrierSuspendHandshake : public HandshakeClosure {
     2	  Semaphore _is_waiting;
     3	  Semaphore _wait;
     4	  bool _started;
     5	public:
     6	  EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"),
     7	  _wait(0), _started(false) { }
     8	  void do_thread(Thread* th) {
     9	    _is_waiting.signal();
    10	    _wait.wait();
    11	    Atomic::store(&_started, true);
    12	  }
    13	  void wait_until_eb_stopped() { _is_waiting.wait(); }
    14	  void start_thread() {
    15	    _wait.signal();
    16	    while(!Atomic::load(&_started)) {
    17	      os::naked_yield();
    18	    }
    19	  }
    20	};

-----Original Message-----
From: Robbin Ehn <robbin.ehn at oracle.com> 
Sent: Montag, 16. Dezember 2019 11:21
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard, as mentioned it would be better if you could do this with
handshakes, instead of using _suspend_flag (since they are going away).
But I can't think of a way doing it without blocking safepoints, so we need to
add some more features in handshakes first.
When possible I hope you are willing to move this code to handshakes instead.

You could stop one thread with, e.g.:
class EscapeBarrierSuspendHandshake : public HandshakeClosure {
   Semaphore _is_waiting;
   Semaphore _wait;
   bool _started;
  public:
   EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), 
_wait(0), _started(false) { }
   void do_thread(Thread* th) {
     _is_waiting.signal();
     _wait.wait();
     Atomic::store(&_started, true);
   }
   void wait_until_eb_stopped() { _is_waiting.wait(); }
   void start_thread() {
     _wait.signal();
     while(!Atomic::load(&_started)) {
       os::naked_yield();
     }
   }
};

But it would block safepoints.

Thanks, Robbin

On 12/10/19 10:45 PM, Reingruber, Richard wrote:
> Hi,
> 
> I would like to get reviews please for
> 
> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
> 
> Corresponding RFE:
> https://bugs.openjdk.java.net/browse/JDK-8227745
> 
> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
> 
> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
> change is being tested at SAP since I posted the first RFR some months ago.
> 
> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
> agents request capabilities that allow them to access local variable values. E.g. if you start-up
> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
> from the beginning, well before a debugger attaches -- if ever one should do so. With the
> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
> you'll find more details.
> 
> Thanks,
> Richard.
> 
> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>      http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
> 

From goetz.lindenmaier at sap.com  Mon Dec 16 14:34:00 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 16 Dec 2019 14:34:00 +0000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
Message-ID: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi,

I'm resending this with fixed bugId ...
Sorry!

Best regards,
  Goetz

> Hi,
> 
> PrintInlining and TraceLoopPredicate allocate stringStreams with new and
> relied on the fact that all memory used is on the ResourceArea cleaned
> after the compilation.
> 
> Since 8224193 the char* of the stringStream is malloced and thus
> must be freed. No doing so manifests a memory leak.
> This is only relevant if the corresponding tracing is active.
> 
> To fix TraceLoopPredicate I added the destructor call 
> Fixing PrintInlining is a bit more complicated, as it uses several
> stringStreams. A row of them is in a GrowableArray which must
> be walked to free all of them.
> As the GrowableArray is on an arena no destructor is called for it.
> 
> I also changed some as_string() calls to base() calls which reduced
> memory need of the traces, and added a comment explaining the
> constructor of GrowableArray that calls the copyconstructor for its
> elements.
> 
> Please review:
> http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/01/
> 
> Best regards,
>   Goetz.


From jatin.bhateja at intel.com  Mon Dec 16 15:42:00 2019
From: jatin.bhateja at intel.com (Bhateja, Jatin)
Date: Mon, 16 Dec 2019 15:42:00 +0000
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
In-Reply-To: <f9b62995-48a8-1dcf-468e-a0c53d3ee83c@oracle.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
 <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0D9641@BGSMSX106.gar.corp.intel.com>
 <f9b62995-48a8-1dcf-468e-a0c53d3ee83c@oracle.com>
Message-ID: <A66BBE673E08E1428E3A918AE4D5B32C0A0DA24F@BGSMSX106.gar.corp.intel.com>

Hi Tobias,

Please find below the updated patch with the test case.

http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/

Kindly push it to the repository if there are no further comments.

Thanks,
Jatin

> -----Original Message-----
> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> Sent: Monday, December 16, 2019 1:11 PM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
> 
> Hi Jatin,
> 
> this looks good to me too but could you please add a regression test (for
> example, simplified version of the JavaFuzzer generated test)?
> 
> Also, please set the bug to "In Progress".
> 
> Thanks,
> Tobias
> 
> On 15.12.19 09:41, Bhateja, Jatin wrote:
> > Hi Vladimir,
> >
> > Updated patch is placed at following link.
> >
> > http://cr.openjdk.java.net/~jbhateja/8230185/webrev.01/
> >
> > Kindly also push this to the repository.
> >
> > Regards,
> > Jatin
> >
> >> -----Original Message-----
> >> From: hotspot-compiler-dev <hotspot-compiler-dev-
> >> bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
> >> Sent: Saturday, December 14, 2019 1:13 AM
> >> To: hotspot-compiler-dev at openjdk.java.net
> >> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid
> >> node class
> >>
> >> Hi Jatin
> >>
> >> Yes, this fix is correct. But you added trailing spaces to modified
> >> lines. Please, fix it.
> >>
> >> Thanks,
> >> Vladimir
> >>
> >> On 12/13/19 9:22 AM, Bhateja, Jatin wrote:
> >>> Hi All,
> >>>
> >>> Please find below a link to the patch
> >>>
> >>> JBS          : https://bugs.openjdk.java.net/browse/JDK-8230185
> >>> WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/
> >>>
> >>> Here first level compilation for the method was done by C1 compiler
> >>> since -Xcomp and -XX:+TieredCompilation (default) options were used,
> >>> as the
> >> back-edge taken count went beyond threshold for Inner-Loop, OSR
> >> compilation request was issued to C2 compiler.
> >>>
> >>> Ideal construction  begins from the hot loop header block and
> >>> follows the
> >> control flows exposed by the ciTypeFlow model (CFG created over raw
> >> bytecodes) keeping branch probabilities into consideration. Loop
> >> detection also begins from the hot loop header and does a DFS walk
> >> till it encounters a backedge,  newly detected loop containing the
> >> mul-add graph pattern in this case is irreducible and not a natural
> >> counted loop which is a must requirement for vector VNNI pattern
> >> detection. Adding a missing check for counted loop to prevent crash here.
> >>>
> >>> Kindly review.
> >>>
> >>> Regards,
> >>> Jatin
> >>>

From robbin.ehn at oracle.com  Mon Dec 16 17:20:50 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 16 Dec 2019 18:20:50 +0100
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <DB7PR02MB3612D26A0522C4B17B924E9A9B510@DB7PR02MB3612.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com>
 <DB7PR02MB3612D26A0522C4B17B924E9A9B510@DB7PR02MB3612.eurprd02.prod.outlook.com>
Message-ID: <9f24ec2c-d737-f9b7-8821-5905264971a7@oracle.com>

Hi Richard,

On 2019-12-16 14:41, Reingruber, Richard wrote:
> Hi Robbin,
> 
> first of all: thanks a lot for providing feedback. I do appreciate it.
> 
> I am absolutely willing to move this to handshakes. Only I still can't see how to achieve it.
> 
> Could you explain the drafted class EscapeBarrierSuspendHandshake a little bit? [1]
> 
> I'd like to look at it by example of JvmtiEnv::GetOwnedMonitorStackDepthInfo() where calling_thread
> T1 would apply it on another thread T2.

Sorry I don't immediately see what issue there is in doing a handshake 
instead of:
VM_GetOwnedMonitorInfo op(this, calling_thread, java_thread, 
owned_monitors_list);

> 
> 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore?
> 
> 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking
>     T2, because either T2 or the VMThread will block in L10.

Yes, sorry, I forgot/confused myself about asynch handshake.
(I have a test prototype for that, which removes suspend flag)

> 
> I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could
> continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating
> objects triggers GC or attempting to execute the vm operation in
> JvmtiEnv::GetOwnedMonitorStackDepthInfo().
> 
> It might be impossible to replace my suspend flag with handshakes that are available today, because
> if it was you could replace all the suspend flags right away, couldn't you?

So adding asynch handshakes and a per thread handshake queue, we can. 
(which this test prototype does)
The issue I'm thinking of is if we need selective polling first.
Suspend flags are not checked in every transition, e.g. vm->native.
A JVM TI agent don't expect to suspend it's own thread when suspending
all threads.
(that thread would be suspended when trying to get back to agent code
when it does vm->native transition)

> 
> Or I'm simply missing something... quite possible... :)

No I think you got it right.

Thanks, Robbin

> 
> Thanks, Richard.
> 
> [1] Drafted by Robbin (thanks!)
> 
>       1	class EscapeBarrierSuspendHandshake : public HandshakeClosure {
>       2	  Semaphore _is_waiting;
>       3	  Semaphore _wait;
>       4	  bool _started;
>       5	public:
>       6	  EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"),
>       7	  _wait(0), _started(false) { }
>       8	  void do_thread(Thread* th) {
>       9	    _is_waiting.signal();
>      10	    _wait.wait();
>      11	    Atomic::store(&_started, true);
>      12	  }
>      13	  void wait_until_eb_stopped() { _is_waiting.wait(); }
>      14	  void start_thread() {
>      15	    _wait.signal();
>      16	    while(!Atomic::load(&_started)) {
>      17	      os::naked_yield();
>      18	    }
>      19	  }
>      20	};
> 
> -----Original Message-----
> From: Robbin Ehn <robbin.ehn at oracle.com>
> Sent: Montag, 16. Dezember 2019 11:21
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
> 
> Hi Richard, as mentioned it would be better if you could do this with
> handshakes, instead of using _suspend_flag (since they are going away).
> But I can't think of a way doing it without blocking safepoints, so we need to
> add some more features in handshakes first.
> When possible I hope you are willing to move this code to handshakes instead.
> 
> You could stop one thread with, e.g.:
> class EscapeBarrierSuspendHandshake : public HandshakeClosure {
>     Semaphore _is_waiting;
>     Semaphore _wait;
>     bool _started;
>    public:
>     EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"),
> _wait(0), _started(false) { }
>     void do_thread(Thread* th) {
>       _is_waiting.signal();
>       _wait.wait();
>       Atomic::store(&_started, true);
>     }
>     void wait_until_eb_stopped() { _is_waiting.wait(); }
>     void start_thread() {
>       _wait.signal();
>       while(!Atomic::load(&_started)) {
>         os::naked_yield();
>       }
>     }
> };
> 
> But it would block safepoints.
> 
> Thanks, Robbin
> 
> On 12/10/19 10:45 PM, Reingruber, Richard wrote:
>> Hi,
>>
>> I would like to get reviews please for
>>
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>
>> Corresponding RFE:
>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>
>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>
>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
>> change is being tested at SAP since I posted the first RFR some months ago.
>>
>> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
>> agents request capabilities that allow them to access local variable values. E.g. if you start-up
>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
>> from the beginning, well before a debugger attaches -- if ever one should do so. With the
>> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
>> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
>> you'll find more details.
>>
>> Thanks,
>> Richard.
>>
>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>       http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
>>

From vladimir.x.ivanov at oracle.com  Mon Dec 16 22:29:09 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 17 Dec 2019 01:29:09 +0300
Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate
 nodes
In-Reply-To: <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com>
References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>
 <d44e28db-19d9-b00b-8a53-40e7e934d4be@oracle.com>
 <b5913422-d9b6-5e13-e1c9-15be352e317a@oracle.com>
 <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com>
Message-ID: <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com>

Hi Vladimir,

Updated version:
   http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/

What's changed:
   * Added more comments
   * Fixed missing cases (Repl4B_imm() and Repl8B_imm)
     * Double-checked that there are no other missing cases left:
 
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt

I'd like to reiterate that I deliberately don't want to spend too much 
time polishing the current version because I consider it as an interim 
point and not as the optimal and desired one (though it's definitely 
much better than where we started both in number of instructions and 
clarity).

The shape I want to see in the next couple of iterations is the following:

   (1) all the operation implementations encapsulated in MacroAssembler
     * CPU dispatching will happen there, not in AD file;

   (2) get rid of vec vs legVec separation as much as possible
     * one of the ways to fix it is to introduce additional operand 
types (for example, vecBW == {legVec when avx512() && !avx512bw(); vec, 
otherwise})

It would turn current ReplB_reg & ReplB_reg_leg into:

instruct ReplB_reg(vecBW dst, scalar src) %{
   match(Set dst (ReplicateB src));
   format %{ "replicate $dst,$src" %}
   ins_encode %{
     uint vlen = vector_length(this);
     __ replicate_byte($dst$$XMMRegister, $src$$Register, vlen);
   %}
%}

where MacroAssembler::replicate_byte() hides all the dispatching logic 
against CPU capabilities and vector length.

Moreover, it opens additional merging opportunities. As an example:

instruct ReplBS_reg(vecBW dst, rRegI src) %{
   match(Set dst (ReplicateB src));
   match(Set dst (ReplicateS src));
   format %{ "replicate $dst,$src" %}
   ins_encode %{
     uint vlen = vector_length(this);
     switch (ideal_Opcode()) {
        case Op_ReplicateB: __ replicate_byte($dst$$XMMRegister, 
$src$$Register, vlen); break;
        case Op_ReplicateS: __ replicate_short($dst$$XMMRegister, 
$src$$Register, vlen); break;
        default: ShouldNotReachHere();
     }
   %}
   ins_pipe( pipe_slow );
%}

If we agree that this is the direction we want to move, splitting 
instructions is counter-productive and just pushes more work for later 
iterations. Same applies to dispatching logic on CPU & vector length.

Best regards,
Vladimir Ivanov

On 14.12.2019 01:18, Vladimir Kozlov wrote:
> On 12/13/19 2:27 AM, Vladimir Ivanov wrote:
>> Thanks for the feedback, Vladimir.
>>
>>> replicateB
>>>
>>> Can you fold it differently?
>>>
>>> ReplB_reg_leg
>>
>> Are you talking about ReplB_reg?
> 
> Yes, I was talking about ReplB_reg. I thought we can combine all [8-64] 
> length vectors but I missed that ReplB_reg_leg uses legVec and needs 
> separate instructions :(
> 
> And I wanted to separate instruction which use avx 512 (evpbroadcastb) 
> because it is difficult to see relation between predicate condition 
> (length() == 64 && VM_Version::supports_avx512bw()) and check in code 
> (vlen == 64 || VM_Version::supports_avx512vlbw()). First, || vs &&. 
> Second, avx512bw vs avx512vlbw. May be better to have a separate 
> instruction for this.
> 
>>
>>> ?? predicate(!VM_Version::supports_avx512vlbw());
>>
>> 3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>> 3152?? predicate(n->as_Vector()->length() == 64 && 
>> !VM_Version::supports_avx512bw());
>>
>> For ReplB_reg_leg the predicate can't be simplified: it is applicable 
>> only to 512bit vector when AVX512BW is absent. Otherwise, legVec 
>> constraint will be unnecessarily applied to other configurations.
> 
> That is why you replaced !avx512vlbw with !avx512bw?
> May be this section of code need comment which explains why one or an 
> other is used.
> 
>>
>> 3119 instruct ReplB_reg(vec dst, rRegI src) %{
>> 3120?? predicate((n->as_Vector()->length() <= 32) ||
>> 3121???????????? (n->as_Vector()->length() == 64 && 
>> VM_Version::supports_avx512bw()));
>>
>> For ReplB_reg there's a shorter version:
>>
>> predicate(n->as_Vector()->length() <= 32 || 
>> VM_Version::supports_avx512bw());
>>
>>
>> But do you find it easier to read? For example, when you are checking 
>> that all configurations are covered:
>>
>> predicate(n->as_Vector()->length() <= 32 ||
>> ?????????? VM_Version::supports_avx512bw());
>>
>> instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>> ?? predicate(n->as_Vector()->length() == 64 &&
>> ??????????? !VM_Version::supports_avx512bw());
>>
>> vs
>>
>> instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>> ?? predicate(n->as_Vector()->length() == 64 && 
>> !VM_Version::supports_avx512bw());
>>
>> instruct ReplB_reg(vec dst, rRegI src) %{
>> ?? predicate((n->as_Vector()->length() <= 32) ||
>> ???????????? (n->as_Vector()->length() == 64 && 
>> VM_Version::supports_avx512bw()));
> 
> I think next conditions in predicates require comment about using 
> avx512vlbw and avx512bw.
> 
> !?? predicate((n->as_Vector()->length() <= 32 && 
> VM_Version::supports_avx512vlbw()) ||
> !???????????? (n->as_Vector()->length() == 64 && 
> VM_Version::supports_avx512bw()));
> 
>>
>>
>>> ?? ins_encode %{
>>> ???? uint vlen = vector_length(this);
>>> ???? __ movdl($dst$$XMMRegister, $src$$Register);
>>> ???? __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister);
>>> ???? __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00);
>>> ???? if (vlen > 8) {
>>> ?????? __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister);
>>> ?????? if (vlen > 16) {
>>> ???????? __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister);
>>> ???????? if (vlen > 32) {
>>> ?????????? assert(vlen == 64, "sanity");
>>> ?????????? __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, 
>>> $dst$$XMMRegister, 0x1);
>>
>> Yes, it should work as well. Do you find it easier to read though?
> 
> Code is smaller.
> 
>>
>>> Similar ReplB_imm_leg for which I don't see new implementation.
>>
>> Good catch. Added it back.
>>
>> (FTR completeness for reg2reg variants (_reg*) is mandatory. But for 
>> _mem and _imm it is optional: if they some configuration isn't 
>> covered, _reg is used. But it was in original code, so I added it back.)
> 
> I don't see code which was in Repl4B_imm() and Repl8B_imm() (only movdl 
> and movq without vpbroadcastb).
> 
>>
>> Updated version:
>> ?? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00
> 
> Should be webrev.01
> 
>>
>> Another thing I noticed is that for ReplI/.../ReplD cases avx512vl 
>> checks are not necessary:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/ 
>>
>>
>> The code assumes that evpbroadcastd/evpbroadcastq, 
>> vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need AVX512VL 
>> for 512bit case, but Intel manual says AVX512F is enough.
>>
>> I plan to handle it as a separate change, but let me know if you want 
>> to incorporate it into 8235825.
> 
> Yes, lets do it separately.
> 
>>
>>> It should also simplify code for avx512 which one or 2 instructions.
>>
>> Can you elaborate, please? Are you talking about the case when version 
>> for different vector sizes differ in 1-2 instructions (like ReplB_reg)?
> No, I was talking about cases when evpbroadcastb and vpbroadcastb 
> instructions are used. I was think to have them in separate 
> instructions. In your latest version it would be only evpbroadcastb case 
> from ReplB_reg().
> 
> Thanks,
> Vladimir
> 
>>
>>> Other types changes can be done same way.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> On 12/12/19 3:19 AM, Vladimir Ivanov wrote:
>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/
>>>> https://bugs.openjdk.java.net/browse/JDK-8235825
>>>>
>>>> Merge AD instructions for the following vector nodes:
>>>> ?? - ReplicateB, ..., ReplicateD
>>>>
>>>> Individual patches:
>>>>
>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual 
>>>>
>>>>
>>>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX)
>>>>
>>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>>> Reviewed-by: vlivanov, sviswanathan, ?
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov

From david.holmes at oracle.com  Tue Dec 17 07:03:00 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 17 Dec 2019 17:03:00 +1000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
 <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
Message-ID: <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>

<resend as my mailer crashed during last send>

David

On 17/12/2019 4:57 pm, David Holmes wrote:
> Hi Richard,
> 
> On 14/12/2019 5:01 am, Reingruber, Richard wrote:
>> Hi David,
>>
>> ?? > Some further queries/concerns:
>> ?? >
>> ?? > src/hotspot/share/runtime/objectMonitor.cpp
>> ?? >
>> ?? > Can you please explain the changes to ObjectMonitor::wait:
>> ?? >
>> ?? > !?? _recursions = save????? // restore the old recursion count
>> ?? > !???????????????? + jt->get_and_reset_relock_count_after_wait(); //
>> ?? > increased by the deferred relock count
>> ?? >
>> ?? > what is the "deferred relock count"? I gather it relates to
>> ?? >
>> ?? > "The code was extended to be able to deoptimize objects of a 
>> frame that
>> ?? > is not the top frame and to let another thread than the owning 
>> thread do
>> ?? > it."
>>
>> Yes, these relate. Currently EA based optimizations are reverted, when 
>> a compiled frame is replaced
>> with corresponding interpreter frames. Part of this is relocking 
>> objects with eliminated
>> locking. New with the enhancement is that we do this also just before 
>> object references are acquired
>> through JVMTI. In this case we deoptimize also the owning compiled 
>> frame C and we register
>> deoptimized objects as deferred updates. When control returns to C it 
>> gets deoptimized, we notice
>> that objects are already deoptimized (reallocated and relocked), so we 
>> don't do it again (relocking
>> twice would be incorrect of course). Deferred updates are copied into 
>> the new interpreter frames.
>>
>> Problem: relocking is not possible if the target thread T is waiting 
>> on the monitor that needs to be
>> relocked. This happens only with non-local objects with 
>> EliminateNestedLocks. Instead relocking is
>> deferred until T owns the monitor again. This is what the piece of 
>> code above does.
> 
> Sorry I need some more detail here. How can you wait() on an object 
> monitor if the object allocation and/or locking was optimised away? And 
> what is a "non-local object" in this context? Isn't EA restricted to 
> thread-confined objects?
> 
> Is it just that some of the locking gets optimized away e.g.
> 
> synchronised(obj) {
>  ? synchronised(obj) {
>  ??? synchronised(obj) {
>  ????? obj.wait();
>  ??? }
>  ? }
> }
> 
> If this is reduced to a form as-if it were a single lock of the monitor 
> (due to EA) and the wait() triggers a JVM TI event which leads to the 
> escape of "obj" then we need to reconstruct the true lock state, and so 
> when the wait() internally unblocks and reacquires the monitor it has to 
> set the true recursion count to 3, not the 1 that it appeared to be when 
> wait() was initially called. Is that the scenario?
> 
> If so I find this truly awful. Anyone using wait() in a realistic form 
> requires a notification and so the object cannot be thread confined. In 
> which case I would strongly argue that upon hitting the wait() the deopt 
> should occur unconditionally and so the lock state is correct before we 
> wait and so we don't need to mess with the recursion count internally 
> when we reacquire the monitor.
> 
>>
>> ?? > which I don't like the sound of at all when it comes to 
>> ObjectMonitor
>> ?? > state. So I'd like to understand in detail exactly what is going 
>> on here
>> ?? > and why.? This is a very intrusive change that seems to badly break
>> ?? > encapsulation and impacts future changes to ObjectMonitor that 
>> are under
>> ?? > investigation.
>>
>> I would not regard this as breaking encapsulation. Certainly not badly.
>>
>> I've added a property relock_count_after_wait to JavaThread. The 
>> property is well
>> encapsulated. Future ObjectMonitor implementations have to deal with 
>> recursion too. They are free in
>> choosing a way to do that as long as that property is taken into 
>> account. This is hardly a
>> limitation.
> 
> I do think this badly breaks encapsulation as you have to add a callout 
> from the guts of the ObjectMonitor code to reach into the thread to get 
> this lock count adjustment. I understand why you have had to do this but 
> I would much rather see a change to the EA optimisation strategy so that 
> this is not needed.
> 
>> Note also that the property is a straight forward extension of the 
>> existing concept of deferred
>> local updates. It is embedded into the structure holding them. So not 
>> even the footprint of a
>> JavaThread is enlarged if no deferred updates are generated.
>>
>> ?? > ---
>> ?? >
>> ?? > src/hotspot/share/runtime/thread.cpp
>> ?? >
>> ?? > Can you please explain why 
>> JavaThread::wait_for_object_deoptimization
>> ?? > has to be handcrafted in this way rather than using proper 
>> transitions.
>> ?? >
>>
>> I wrote wait_for_object_deoptimization taking 
>> JavaThread::java_suspend_self_with_safepoint_check
>> as template. So in short: for the same reasons :)
>>
>> Threads reach both methods as part of thread state transitions, 
>> therefore special handling is
>> required to change thread state on top of ongoing transitions.
>>
>> ?? > We got rid of "deopt suspend" some time ago and it is disturbing 
>> to see
>> ?? > it being added back (effectively). This seems like it may be 
>> something
>> ?? > that handshakes could be used for.
>>
>> Deopt suspend used to be something rather different with a similar 
>> name[1]. It is not being added back.
> 
> I stand corrected. Despite comments in the code to the contrary 
> deopt_suspend didn't actually cause a self-suspend. I was doing a lot of 
> cleanup in this area 13 years ago :)
> 
>>
>> I'm actually duplicating the existing external suspend mechanism, 
>> because a thread can be suspended
>> at most once. And hey, and don't like that either! But it seems not 
>> unlikely that the duplicate can
>> be removed together with the original and the new type of handshakes 
>> that will be used for
>> thread suspend can be used for object deoptimization too. See today's 
>> discussion in JDK-8227745 [2].
> 
> I hope that discussion bears some fruit, at the moment it seems not to 
> be possible to use handshakes here. :(
> 
> The external suspend mechanism is a royal pain in the proverbial that we 
> have to carefully live with. The idea that we're duplicating that for 
> use in another fringe area of functionality does not thrill me at all.
> 
> To be clear, I understand the problem that exists and that you wish to 
> solve, but for the runtime parts I balk at the complexity cost of 
> solving it.
> 
> Thanks,
> David
> -----
> 
>> Thanks, Richard.
>>
>> [1] Deopt suspend was something like an async. handshake for 
>> architectures with register windows,
>> ???? where patching the return pc for deoptimization of a compiled 
>> frame was racy if the owner thread
>> ???? was in native code. Instead a "deopt" suspend flag was set on 
>> which the thread patched its own
>> ???? frame upon return from native. So no thread was suspended. It got 
>> its name only from the name of
>> ???? the flags.
>>
>> [2] Discussion about using handshakes to sync. with the target thread:
>>      
>> https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727 
>>
>>
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Freitag, 13. Dezember 2019 00:56
>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>> serviceability-dev at openjdk.java.net; 
>> hotspot-compiler-dev at openjdk.java.net; 
>> hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better 
>> Performance in the Presence of JVMTI Agents
>>
>> Hi Richard,
>>
>> Some further queries/concerns:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp
>>
>> Can you please explain the changes to ObjectMonitor::wait:
>>
>> !?? _recursions = save????? // restore the old recursion count
>> !???????????????? + jt->get_and_reset_relock_count_after_wait(); //
>> increased by the deferred relock count
>>
>> what is the "deferred relock count"? I gather it relates to
>>
>> "The code was extended to be able to deoptimize objects of a frame that
>> is not the top frame and to let another thread than the owning thread do
>> it."
>>
>> which I don't like the sound of at all when it comes to ObjectMonitor
>> state. So I'd like to understand in detail exactly what is going on here
>> and why.? This is a very intrusive change that seems to badly break
>> encapsulation and impacts future changes to ObjectMonitor that are under
>> investigation.
>>
>> ---
>>
>> src/hotspot/share/runtime/thread.cpp
>>
>> Can you please explain why JavaThread::wait_for_object_deoptimization
>> has to be handcrafted in this way rather than using proper transitions.
>>
>> We got rid of "deopt suspend" some time ago and it is disturbing to see
>> it being added back (effectively). This seems like it may be something
>> that handshakes could be used for.
>>
>> Thanks,
>> David
>> -----
>>
>> On 12/12/2019 7:02 am, David Holmes wrote:
>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
>>>> Hi David,
>>>>
>>>> ??? > Most of the details here are in areas I can comment on in detail,
>>>> but I
>>>> ??? > did take an initial general look at things.
>>>>
>>>> Thanks for taking the time!
>>>
>>> Apologies the above should read:
>>>
>>> "Most of the details here are in areas I *can't* comment on in detail 
>>> ..."
>>>
>>> David
>>>
>>>> ??? > The only thing that jumped out at me is that I think the
>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread.
>>>> ??? >
>>>> ??? > +? bool is_hidden_from_external_view() const { return true; }
>>>>
>>>> Yes, it should. Will add the method like above.
>>>>
>>>> ??? > Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>> Without
>>>> ??? > active testing this will just bit-rot.
>>>>
>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>> workload. I will add a minimal test
>>>> to keep it fresh.
>>>>
>>>> ??? > Also on the tests I don't understand your @requires clause:
>>>> ??? >
>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>> ??? > (vm.opt.TieredCompilation != true))
>>>> ??? >
>>>> ??? > This seems to require that TieredCompilation is disabled, but
>>>> tiered is
>>>> ??? > our normal mode of operation. ??
>>>> ??? >
>>>>
>>>> I removed the clause. I guess I wanted to target the tests towards the
>>>> code they are supposed to
>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>> with just one compiler thread.
>>>>
>>>> Additionally I will make use of
>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>> serviceability-dev at openjdk.java.net;
>>>> hotspot-compiler-dev at openjdk.java.net;
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>> Performance in the Presence of JVMTI Agents
>>>>
>>>> Hi Richard,
>>>>
>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>> Hi,
>>>>>
>>>>> I would like to get reviews please for
>>>>>
>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>
>>>>> Corresponding RFE:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>
>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>>
>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
>>>>> issues (thanks!). In addition the
>>>>> change is being tested at SAP since I posted the first RFR some
>>>>> months ago.
>>>>>
>>>>> The intention of this enhancement is to benefit performance wise from
>>>>> escape analysis even if JVMTI
>>>>> agents request capabilities that allow them to access local variable
>>>>> values. E.g. if you start-up
>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
>>>>> escape analysis is disabled right
>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>> should do so. With the
>>>>> enhancement, escape analysis will remain enabled until and after a
>>>>> debugger attaches. EA based
>>>>> optimizations are reverted just before an agent acquires the
>>>>> reference to an object. In the JBS item
>>>>> you'll find more details.
>>>>
>>>> Most of the details here are in areas I can comment on in detail, but I
>>>> did take an initial general look at things.
>>>>
>>>> The only thing that jumped out at me is that I think the
>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>
>>>> +? bool is_hidden_from_external_view() const { return true; }
>>>>
>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. 
>>>> Without
>>>> active testing this will just bit-rot.
>>>>
>>>> Also on the tests I don't understand your @requires clause:
>>>>
>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>> (vm.opt.TieredCompilation != true))
>>>>
>>>> This seems to require that TieredCompilation is disabled, but tiered is
>>>> our normal mode of operation. ??
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch 
>>>>>
>>>>>
>>>>>

From tobias.hartmann at oracle.com  Tue Dec 17 07:39:42 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 17 Dec 2019 08:39:42 +0100
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
In-Reply-To: <A66BBE673E08E1428E3A918AE4D5B32C0A0DA24F@BGSMSX106.gar.corp.intel.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
 <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0D9641@BGSMSX106.gar.corp.intel.com>
 <f9b62995-48a8-1dcf-468e-a0c53d3ee83c@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0DA24F@BGSMSX106.gar.corp.intel.com>
Message-ID: <f5581f66-7f31-f7a7-8097-44c2342453ad@oracle.com>

Hi Jatin,

On 16.12.19 16:42, Bhateja, Jatin wrote:
> Please find below the updated patch with the test case.
> 
> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/

Thanks for adding the test. Some comments:
- We try to avoid bug ids in test names. I would suggest a more descriptive name like
"TestIrreducibleLoopWithVNNI".
- Please also add the test to package compiler.loopopts
- For Java code, we use 4 whitespace indentation
- The variable 'c' is not used in 'mainTest'

Thanks,
Tobias

From richard.reingruber at sap.com  Tue Dec 17 10:24:53 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Tue, 17 Dec 2019 10:24:53 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <9f24ec2c-d737-f9b7-8821-5905264971a7@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com>
 <DB7PR02MB3612D26A0522C4B17B924E9A9B510@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <9f24ec2c-d737-f9b7-8821-5905264971a7@oracle.com>
Message-ID: <DB7PR02MB3612E86F4A19099A16FCB9029B500@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hi Robbin,

  > Sorry I don't immediately see what issue there is in doing a handshake 
  > instead of:
  > VM_GetOwnedMonitorInfo op(this, calling_thread, java_thread, 
  > owned_monitors_list);

VM_GetOwnedMonitorInfo /can/ be replaced by a handshake, but the calling_thread T1 needs to walk
java_thread T2's stack /before/ to reallocate and relock objects, because the GC interface does not
allow the VMThread to allocate from the java heap.

T1:
1. reallocate scalar replaced objects of T2   //  not possible as part of handshake/vmop,
                                              //  because GC interface does not allow VMThread
                                              //  to allocate from heap
2. execute VM_GetOwnedMonitorInfo() or equivalent handshake

while T2 is /not/ pushing new frames with EA based optimizations.

  > > 
  > > 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore?
  > > 
  > > 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking
  > >     T2, because either T2 or the VMThread will block in L10.
  > 
  > Yes, sorry, I forgot/confused myself about asynch handshake.
  > (I have a test prototype for that, which removes suspend flag)
  > 
  > > 
  > > I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could
  > > continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating
  > > objects triggers GC or attempting to execute the vm operation in
  > > JvmtiEnv::GetOwnedMonitorStackDepthInfo().
  > > 
  > > It might be impossible to replace my suspend flag with handshakes that are available today, because
  > > if it was you could replace all the suspend flags right away, couldn't you?
  > 
  > So adding asynch handshakes and a per thread handshake queue, we can. 
  > (which this test prototype does)

Yes, should work for my use case too.

  > The issue I'm thinking of is if we need selective polling first.
  > Suspend flags are not checked in every transition, e.g. vm->native.
  > A JVM TI agent don't expect to suspend it's own thread when suspending
  > all threads.
  > (that thread would be suspended when trying to get back to agent code
  > when it does vm->native transition)

Note that JVM TI doesn't offer "suspending all threads" directly. It offers SuspendThreadList [1]
which can be used to self-suspend: "If the calling thread is specified in the request_list array,
this function will not return until some other thread resumes it"

Thanks, Richard.

[1] https://docs.oracle.com/en/java/javase/13/docs/specs/jvmti.html#SuspendThreadList

-----Original Message-----
From: Robbin Ehn <robbin.ehn at oracle.com> 
Sent: Montag, 16. Dezember 2019 18:21
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard,

On 2019-12-16 14:41, Reingruber, Richard wrote:
> Hi Robbin,
> 
> first of all: thanks a lot for providing feedback. I do appreciate it.
> 
> I am absolutely willing to move this to handshakes. Only I still can't see how to achieve it.
> 
> Could you explain the drafted class EscapeBarrierSuspendHandshake a little bit? [1]
> 
> I'd like to look at it by example of JvmtiEnv::GetOwnedMonitorStackDepthInfo() where calling_thread
> T1 would apply it on another thread T2.

Sorry I don't immediately see what issue there is in doing a handshake 
instead of:
VM_GetOwnedMonitorInfo op(this, calling_thread, java_thread, 
owned_monitors_list);

> 
> 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore?
> 
> 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking
>     T2, because either T2 or the VMThread will block in L10.

Yes, sorry, I forgot/confused myself about asynch handshake.
(I have a test prototype for that, which removes suspend flag)

> 
> I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could
> continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating
> objects triggers GC or attempting to execute the vm operation in
> JvmtiEnv::GetOwnedMonitorStackDepthInfo().
> 
> It might be impossible to replace my suspend flag with handshakes that are available today, because
> if it was you could replace all the suspend flags right away, couldn't you?

So adding asynch handshakes and a per thread handshake queue, we can. 
(which this test prototype does)
The issue I'm thinking of is if we need selective polling first.
Suspend flags are not checked in every transition, e.g. vm->native.
A JVM TI agent don't expect to suspend it's own thread when suspending
all threads.
(that thread would be suspended when trying to get back to agent code
when it does vm->native transition)

> 
> Or I'm simply missing something... quite possible... :)

No I think you got it right.

Thanks, Robbin

> 
> Thanks, Richard.
> 
> [1] Drafted by Robbin (thanks!)
> 
>       1	class EscapeBarrierSuspendHandshake : public HandshakeClosure {
>       2	  Semaphore _is_waiting;
>       3	  Semaphore _wait;
>       4	  bool _started;
>       5	public:
>       6	  EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"),
>       7	  _wait(0), _started(false) { }
>       8	  void do_thread(Thread* th) {
>       9	    _is_waiting.signal();
>      10	    _wait.wait();
>      11	    Atomic::store(&_started, true);
>      12	  }
>      13	  void wait_until_eb_stopped() { _is_waiting.wait(); }
>      14	  void start_thread() {
>      15	    _wait.signal();
>      16	    while(!Atomic::load(&_started)) {
>      17	      os::naked_yield();
>      18	    }
>      19	  }
>      20	};
> 
> -----Original Message-----
> From: Robbin Ehn <robbin.ehn at oracle.com>
> Sent: Montag, 16. Dezember 2019 11:21
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
> 
> Hi Richard, as mentioned it would be better if you could do this with
> handshakes, instead of using _suspend_flag (since they are going away).
> But I can't think of a way doing it without blocking safepoints, so we need to
> add some more features in handshakes first.
> When possible I hope you are willing to move this code to handshakes instead.
> 
> You could stop one thread with, e.g.:
> class EscapeBarrierSuspendHandshake : public HandshakeClosure {
>     Semaphore _is_waiting;
>     Semaphore _wait;
>     bool _started;
>    public:
>     EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"),
> _wait(0), _started(false) { }
>     void do_thread(Thread* th) {
>       _is_waiting.signal();
>       _wait.wait();
>       Atomic::store(&_started, true);
>     }
>     void wait_until_eb_stopped() { _is_waiting.wait(); }
>     void start_thread() {
>       _wait.signal();
>       while(!Atomic::load(&_started)) {
>         os::naked_yield();
>       }
>     }
> };
> 
> But it would block safepoints.
> 
> Thanks, Robbin
> 
> On 12/10/19 10:45 PM, Reingruber, Richard wrote:
>> Hi,
>>
>> I would like to get reviews please for
>>
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>
>> Corresponding RFE:
>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>
>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>
>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
>> change is being tested at SAP since I posted the first RFR some months ago.
>>
>> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
>> agents request capabilities that allow them to access local variable values. E.g. if you start-up
>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
>> from the beginning, well before a debugger attaches -- if ever one should do so. With the
>> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
>> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
>> you'll find more details.
>>
>> Thanks,
>> Richard.
>>
>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>       http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
>>

From robbin.ehn at oracle.com  Tue Dec 17 11:01:03 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Tue, 17 Dec 2019 12:01:03 +0100
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <DB7PR02MB3612E86F4A19099A16FCB9029B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com>
 <DB7PR02MB3612D26A0522C4B17B924E9A9B510@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <9f24ec2c-d737-f9b7-8821-5905264971a7@oracle.com>
 <DB7PR02MB3612E86F4A19099A16FCB9029B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
Message-ID: <08d4f482-36a0-6499-0546-2a888da2a094@oracle.com>

Hi Richard,

On 12/17/19 11:24 AM, Reingruber, Richard wrote:

>    > So adding asynch handshakes and a per thread handshake queue, we can.
>    > (which this test prototype does)
> 
> Yes, should work for my use case too.

Great.

> 
>    > The issue I'm thinking of is if we need selective polling first.
>    > Suspend flags are not checked in every transition, e.g. vm->native.
>    > A JVM TI agent don't expect to suspend it's own thread when suspending
>    > all threads.
>    > (that thread would be suspended when trying to get back to agent code
>    > when it does vm->native transition)
> 
> Note that JVM TI doesn't offer "suspending all threads" directly. It offers SuspendThreadList [1]
> which can be used to self-suspend: "If the calling thread is specified in the request_list array,
> this function will not return until some other thread resumes it"

Maybe there is a test-bug here or it was more complicated scenario.
I have to investigate, but suspending threads in all transitions causes a chunk 
of test failure in jdi/jvmti.
The issue was suspending threads going vm->native (back to agent code).

Thanks, Robbin

> 
> Thanks, Richard.
> 
> [1] https://docs.oracle.com/en/java/javase/13/docs/specs/jvmti.html#SuspendThreadList
> 
> -----Original Message-----
> From: Robbin Ehn <robbin.ehn at oracle.com>
> Sent: Montag, 16. Dezember 2019 18:21
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
> 
> Hi Richard,
> 
> On 2019-12-16 14:41, Reingruber, Richard wrote:
>> Hi Robbin,
>>
>> first of all: thanks a lot for providing feedback. I do appreciate it.
>>
>> I am absolutely willing to move this to handshakes. Only I still can't see how to achieve it.
>>
>> Could you explain the drafted class EscapeBarrierSuspendHandshake a little bit? [1]
>>
>> I'd like to look at it by example of JvmtiEnv::GetOwnedMonitorStackDepthInfo() where calling_thread
>> T1 would apply it on another thread T2.
> 
> Sorry I don't immediately see what issue there is in doing a handshake
> instead of:
> VM_GetOwnedMonitorInfo op(this, calling_thread, java_thread,
> owned_monitors_list);
> 
>>
>> 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore?
>>
>> 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking
>>      T2, because either T2 or the VMThread will block in L10.
> 
> Yes, sorry, I forgot/confused myself about asynch handshake.
> (I have a test prototype for that, which removes suspend flag)
> 
>>
>> I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could
>> continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating
>> objects triggers GC or attempting to execute the vm operation in
>> JvmtiEnv::GetOwnedMonitorStackDepthInfo().
>>
>> It might be impossible to replace my suspend flag with handshakes that are available today, because
>> if it was you could replace all the suspend flags right away, couldn't you?
> 
> So adding asynch handshakes and a per thread handshake queue, we can.
> (which this test prototype does)
> The issue I'm thinking of is if we need selective polling first.
> Suspend flags are not checked in every transition, e.g. vm->native.
> A JVM TI agent don't expect to suspend it's own thread when suspending
> all threads.
> (that thread would be suspended when trying to get back to agent code
> when it does vm->native transition)
> 
>>
>> Or I'm simply missing something... quite possible... :)
> 
> No I think you got it right.
> 
> Thanks, Robbin
> 
>>
>> Thanks, Richard.
>>
>> [1] Drafted by Robbin (thanks!)
>>
>>        1	class EscapeBarrierSuspendHandshake : public HandshakeClosure {
>>        2	  Semaphore _is_waiting;
>>        3	  Semaphore _wait;
>>        4	  bool _started;
>>        5	public:
>>        6	  EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"),
>>        7	  _wait(0), _started(false) { }
>>        8	  void do_thread(Thread* th) {
>>        9	    _is_waiting.signal();
>>       10	    _wait.wait();
>>       11	    Atomic::store(&_started, true);
>>       12	  }
>>       13	  void wait_until_eb_stopped() { _is_waiting.wait(); }
>>       14	  void start_thread() {
>>       15	    _wait.signal();
>>       16	    while(!Atomic::load(&_started)) {
>>       17	      os::naked_yield();
>>       18	    }
>>       19	  }
>>       20	};
>>
>> -----Original Message-----
>> From: Robbin Ehn <robbin.ehn at oracle.com>
>> Sent: Montag, 16. Dezember 2019 11:21
>> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
>>
>> Hi Richard, as mentioned it would be better if you could do this with
>> handshakes, instead of using _suspend_flag (since they are going away).
>> But I can't think of a way doing it without blocking safepoints, so we need to
>> add some more features in handshakes first.
>> When possible I hope you are willing to move this code to handshakes instead.
>>
>> You could stop one thread with, e.g.:
>> class EscapeBarrierSuspendHandshake : public HandshakeClosure {
>>      Semaphore _is_waiting;
>>      Semaphore _wait;
>>      bool _started;
>>     public:
>>      EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"),
>> _wait(0), _started(false) { }
>>      void do_thread(Thread* th) {
>>        _is_waiting.signal();
>>        _wait.wait();
>>        Atomic::store(&_started, true);
>>      }
>>      void wait_until_eb_stopped() { _is_waiting.wait(); }
>>      void start_thread() {
>>        _wait.signal();
>>        while(!Atomic::load(&_started)) {
>>          os::naked_yield();
>>        }
>>      }
>> };
>>
>> But it would block safepoints.
>>
>> Thanks, Robbin
>>
>> On 12/10/19 10:45 PM, Reingruber, Richard wrote:
>>> Hi,
>>>
>>> I would like to get reviews please for
>>>
>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>
>>> Corresponding RFE:
>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>
>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>
>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
>>> change is being tested at SAP since I posted the first RFR some months ago.
>>>
>>> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
>>> agents request capabilities that allow them to access local variable values. E.g. if you start-up
>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
>>> from the beginning, well before a debugger attaches -- if ever one should do so. With the
>>> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
>>> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
>>> you'll find more details.
>>>
>>> Thanks,
>>> Richard.
>>>
>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>        http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
>>>

From adinn at redhat.com  Tue Dec 17 11:33:40 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Tue, 17 Dec 2019 11:33:40 +0000
Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK
 due to long offset
In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
Message-ID: <87eb867b-18f6-6b06-7b6b-450391066b7f@redhat.com>

On 12/12/2019 19:14, Andrew Haley wrote:
> This assertion failure happens because load/store patterns in
> aarch64.ad are incorrect.
> . . .
> Andrew Dinn will probably have a cow when he sees this patch. :-)
I am currently suffering birth pangs. I will get back to you with a
proper review asap.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From vladimir.x.ivanov at oracle.com  Tue Dec 17 12:50:13 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 17 Dec 2019 15:50:13 +0300
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
Message-ID: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>

http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-7175279

There was a major rewrite of math intrinsics which in 9 time frame which 
almost completely eliminated x87 code in x86-64 code base.

Proposed patch removes the rest and makes x86-64 code x87-free.

The main motivation for the patch is to completely eliminate 
non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] 
and related enhancements [2].

Most of the changes are in C1, but there is one case in template 
interpreter (java_lang_math_abs) which now uses 
StubRoutines::x86::double_sign_mask(). It forces its initialization to 
be moved to StubRoutines::initialize1().

x87 instructions are made available only on x86-32.

C1 changes involve removing FPU support on x86-64 and effectively make 
x86-specific support in linear scan allocator [2] x86-32-only.

Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8175916

[2] https://bugs.openjdk.java.net/browse/JDK-8136414

[3] 
http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp

From jzaugg at gmail.com  Tue Dec 17 13:15:42 2019
From: jzaugg at gmail.com (Jason Zaugg)
Date: Tue, 17 Dec 2019 23:15:42 +1000
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
 <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
 <CAG3_yeVgfPTc0OnEguqTjoFLPAfXTy1kngp2F8d+XwaSvGGRwg@mail.gmail.com>
 <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com>
Message-ID: <CAG3_yeXG4pA9rWQLT-2-tnkDAyuLYJ9RcKyhLjKMy2DZN23smg@mail.gmail.com>

On Fri, 13 Dec 2019 at 15:07, John Rose <john.r.rose at oracle.com> wrote:

> I?m intrigued that you are interested in this, and I encourage you to
consider
> pulling on this string some more.  I?ll help you pull if you want.
>
> I think this is doable, to a degree, as a starter project, to install new
parameters
> and do initial exploration of their settings.  Actually dialing in the
settings
> and testing them across a range of workloads is a very specialized job,
which
> few of us are good at, but if the optimization seems to pan out we can
find
> the necessary kind of expert.
>
> Your first step, should you choose to accept this mission, would be to
join
> the community.  Your name should be on http://openjdk.java.net/census.
> See http://openjdk.java.net/contribute/.

Thanks for the encouragement. I've submitted an OCA and will work on a
patch.

I'm past some initial tooling issues -- I can build an image and run
jtreg.. I've
added [1] a simple version of the analysis to the existing the flow
analysis and
hooked this into InlineTree::try_to_inline. I've added a new test in the
same
manner as inlining/InlineAccessors.java.

I'll flesh this out and report back after Christmas.

-jason

[1] https://github.com/retronym/jdk/pull/1/

From jesper.wilhelmsson at oracle.com  Tue Dec 17 14:26:29 2019
From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson)
Date: Tue, 17 Dec 2019 15:26:29 +0100
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
Message-ID: <1A324EE8-6965-4F76-B55B-D16184DE2ED9@oracle.com>

Hi,

This is a fairly large (wide spread) change. Is there any risk for conflicts with remaining work in JDK 14?
In the interest of keeping forwardports as conflict free as possible, would it make sense to hold this change until the number of changes in 14 has dropped?

Thanks,
/Jesper

> On 17 Dec 2019, at 13:50, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-7175279
> 
> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base.
> 
> Proposed patch removes the rest and makes x86-64 code x87-free.
> 
> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2].
> 
> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1().
> 
> x87 instructions are made available only on x86-32.
> 
> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only.
> 
> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
> 
> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
> 
> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp


From richard.reingruber at sap.com  Tue Dec 17 14:47:37 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Tue, 17 Dec 2019 14:47:37 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
 <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
 <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>
Message-ID: <DB7PR02MB36127925DB5D6609DDBF96909B500@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hi David,

  > >    > Some further queries/concerns:
  > >    >
  > >    > src/hotspot/share/runtime/objectMonitor.cpp
  > >    >
  > >    > Can you please explain the changes to ObjectMonitor::wait:
  > >    >
  > >    > !   _recursions = save      // restore the old recursion count
  > >    > !                 + jt->get_and_reset_relock_count_after_wait(); //
  > >    > increased by the deferred relock count
  > >    >
  > >    > what is the "deferred relock count"? I gather it relates to
  > >    >
  > >    > "The code was extended to be able to deoptimize objects of a 
  > > frame that
  > >    > is not the top frame and to let another thread than the owning 
  > > thread do
  > >    > it."
  > >
  > > Yes, these relate. Currently EA based optimizations are reverted, when a compiled frame is
  > > replaced with corresponding interpreter frames. Part of this is relocking objects with eliminated
  > > locking. New with the enhancement is that we do this also just before object references are
  > > acquired through JVMTI. In this case we deoptimize also the owning compiled frame C and we
  > > register deoptimized objects as deferred updates. When control returns to C it gets deoptimized,
  > > we notice that objects are already deoptimized (reallocated and relocked), so we don't do it again
  > > (relocking twice would be incorrect of course). Deferred updates are copied into the new
  > > interpreter frames.
  > >
  > > Problem: relocking is not possible if the target thread T is waiting on the monitor that needs to
  > > be relocked. This happens only with non-local objects with EliminateNestedLocks. Instead relocking
  > > is deferred until T owns the monitor again. This is what the piece of code above does.
  >  
  >  Sorry I need some more detail here. How can you wait() on an object 
  >  monitor if the object allocation and/or locking was optimised away? And 
  >  what is a "non-local object" in this context? Isn't EA restricted to 
  >  thread-confined objects?

"Non-local object" is an object that escapes its thread. The issue I'm addressing with the changes
in ObjectMonitor::wait are almost unrelated to EA. They are caused by EliminateNestedLocks, where C2
eliminates recursive locking of an already owned lock. The lock owning object exists on the heap, it
is locked and you can call wait() on it.

EliminateLocks is the C2 option that controls lock elimination based on EA.  Both optimizations have
in common that objects with eliminated locking need to be relocked when deoptimizing a frame,
i.e. when replacing a compiled frame with equivalent interpreter
frames. Deoptimization::relock_objects does that job for /all/ eliminated locks in scope. /All/ can
be a mix of eliminated nested locks and locks of not-escaping objects.

New with the enhancement: I call relock_objects earlier, just before objects pontentially
escape. But then later when the owning compiled frame gets deoptimized, I must not do it again:

See call to EscapeBarrier::objs_are_deoptimized in deoptimization.cpp:

 373   if ((jvmci_enabled || ((DoEscapeAnalysis || EliminateNestedLocks) && EliminateLocks))
 374       && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) {
 375     bool unused;
 376     eliminate_locks(thread, chunk, realloc_failures, deoptee, exec_mode, unused);
 377   }

Now when calling relock_objects early it is quiet possible that I have to relock an object the
target thread currently waits for. Obviously I cannot relock in this case, instead I chose to
introduce relock_count_after_wait to JavaThread.

  >  Is it just that some of the locking gets optimized away e.g.
  >  
  >  synchronised(obj) {
  >     synchronised(obj) {
  >       synchronised(obj) {
  >         obj.wait();
  >       }
  >     }
  >  }
  >  
  >  If this is reduced to a form as-if it were a single lock of the monitor 
  >  (due to EA) and the wait() triggers a JVM TI event which leads to the 
  >  escape of "obj" then we need to reconstruct the true lock state, and so 
  >  when the wait() internally unblocks and reacquires the monitor it has to 
  >  set the true recursion count to 3, not the 1 that it appeared to be when 
  >  wait() was initially called. Is that the scenario?

Kind of... except that the locking is not eliminated due to EA and there is no JVM TI event
triggered by wait.

Add

LocalObject l1 = new LocalObject();

in front of the synchrnized blocks and assume a JVM TI agent acquires l1. This triggers the code in
question.

See that relocking/reallocating is transactional. If it is done then for /all/ objects in scope and it is
done at most once. It wouldn't be quite so easy to split this in relocking of nested/EA-based
eliminated locks.

  >  If so I find this truly awful. Anyone using wait() in a realistic form 
  >  requires a notification and so the object cannot be thread confined. In

It is not thread confined.

  >  which case I would strongly argue that upon hitting the wait() the deopt 
  >  should occur unconditionally and so the lock state is correct before we 
  >  wait and so we don't need to mess with the recursion count internally 
  >  when we reacquire the monitor.
  >  
  > >
  > >    > which I don't like the sound of at all when it comes to ObjectMonitor
  > >    > state. So I'd like to understand in detail exactly what is going on here
  > >    > and why.  This is a very intrusive change that seems to badly break
  > >    > encapsulation and impacts future changes to ObjectMonitor that are under
  > >    > investigation.
  > >
  > > I would not regard this as breaking encapsulation. Certainly not badly.
  > >
  > > I've added a property relock_count_after_wait to JavaThread. The property is well
  > > encapsulated. Future ObjectMonitor implementations have to deal with recursion too. They are free
  > > in choosing a way to do that as long as that property is taken into account. This is hardly a
  > > limitation.
  >  
  >  I do think this badly breaks encapsulation as you have to add a callout 
  >  from the guts of the ObjectMonitor code to reach into the thread to get 
  >  this lock count adjustment. I understand why you have had to do this but 
  >  I would much rather see a change to the EA optimisation strategy so that 
  >  this is not needed.
  >  
  > > Note also that the property is a straight forward extension of the existing concept of deferred
  > > local updates. It is embedded into the structure holding them. So not even the footprint of a
  > > JavaThread is enlarged if no deferred updates are generated.
  > 
  > [...]
  >  
  > >
  > > I'm actually duplicating the existing external suspend mechanism, because a thread can be
  > > suspended at most once. And hey, and don't like that either! But it seems not unlikely that the
  > > duplicate can be removed together with the original and the new type of handshakes that will be
  > > used for thread suspend can be used for object deoptimization too. See today's discussion in
  > > JDK-8227745 [2].
  >  
  >  I hope that discussion bears some fruit, at the moment it seems not to 
  >  be possible to use handshakes here. :(
  >  
  >  The external suspend mechanism is a royal pain in the proverbial that we 
  >  have to carefully live with. The idea that we're duplicating that for 
  >  use in another fringe area of functionality does not thrill me at all.
  >  
  >  To be clear, I understand the problem that exists and that you wish to 
  >  solve, but for the runtime parts I balk at the complexity cost of 
  >  solving it.

I know it's complex, but by far no rocket science.

Also I find it hard to imagine another fix for JDK-8233915 besides changing the JVM TI specification.
 
Thanks, Richard.

-----Original Message-----
From: David Holmes <david.holmes at oracle.com> 
Sent: Dienstag, 17. Dezember 2019 08:03
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; Vladimir Kozlov (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

<resend as my mailer crashed during last send>

David

On 17/12/2019 4:57 pm, David Holmes wrote:
> Hi Richard,
> 
> On 14/12/2019 5:01 am, Reingruber, Richard wrote:
>> Hi David,
>>
>> ?? > Some further queries/concerns:
>> ?? >
>> ?? > src/hotspot/share/runtime/objectMonitor.cpp
>> ?? >
>> ?? > Can you please explain the changes to ObjectMonitor::wait:
>> ?? >
>> ?? > !?? _recursions = save????? // restore the old recursion count
>> ?? > !???????????????? + jt->get_and_reset_relock_count_after_wait(); //
>> ?? > increased by the deferred relock count
>> ?? >
>> ?? > what is the "deferred relock count"? I gather it relates to
>> ?? >
>> ?? > "The code was extended to be able to deoptimize objects of a 
>> frame that
>> ?? > is not the top frame and to let another thread than the owning 
>> thread do
>> ?? > it."
>>
>> Yes, these relate. Currently EA based optimizations are reverted, when 
>> a compiled frame is replaced
>> with corresponding interpreter frames. Part of this is relocking 
>> objects with eliminated
>> locking. New with the enhancement is that we do this also just before 
>> object references are acquired
>> through JVMTI. In this case we deoptimize also the owning compiled 
>> frame C and we register
>> deoptimized objects as deferred updates. When control returns to C it 
>> gets deoptimized, we notice
>> that objects are already deoptimized (reallocated and relocked), so we 
>> don't do it again (relocking
>> twice would be incorrect of course). Deferred updates are copied into 
>> the new interpreter frames.
>>
>> Problem: relocking is not possible if the target thread T is waiting 
>> on the monitor that needs to be
>> relocked. This happens only with non-local objects with 
>> EliminateNestedLocks. Instead relocking is
>> deferred until T owns the monitor again. This is what the piece of 
>> code above does.
> 
> Sorry I need some more detail here. How can you wait() on an object 
> monitor if the object allocation and/or locking was optimised away? And 
> what is a "non-local object" in this context? Isn't EA restricted to 
> thread-confined objects?
> 
> Is it just that some of the locking gets optimized away e.g.
> 
> synchronised(obj) {
>  ? synchronised(obj) {
>  ??? synchronised(obj) {
>  ????? obj.wait();
>  ??? }
>  ? }
> }
> 
> If this is reduced to a form as-if it were a single lock of the monitor 
> (due to EA) and the wait() triggers a JVM TI event which leads to the 
> escape of "obj" then we need to reconstruct the true lock state, and so 
> when the wait() internally unblocks and reacquires the monitor it has to 
> set the true recursion count to 3, not the 1 that it appeared to be when 
> wait() was initially called. Is that the scenario?
> 
> If so I find this truly awful. Anyone using wait() in a realistic form 
> requires a notification and so the object cannot be thread confined. In 
> which case I would strongly argue that upon hitting the wait() the deopt 
> should occur unconditionally and so the lock state is correct before we 
> wait and so we don't need to mess with the recursion count internally 
> when we reacquire the monitor.
> 
>>
>> ?? > which I don't like the sound of at all when it comes to 
>> ObjectMonitor
>> ?? > state. So I'd like to understand in detail exactly what is going 
>> on here
>> ?? > and why.? This is a very intrusive change that seems to badly break
>> ?? > encapsulation and impacts future changes to ObjectMonitor that 
>> are under
>> ?? > investigation.
>>
>> I would not regard this as breaking encapsulation. Certainly not badly.
>>
>> I've added a property relock_count_after_wait to JavaThread. The 
>> property is well
>> encapsulated. Future ObjectMonitor implementations have to deal with 
>> recursion too. They are free in
>> choosing a way to do that as long as that property is taken into 
>> account. This is hardly a
>> limitation.
> 
> I do think this badly breaks encapsulation as you have to add a callout 
> from the guts of the ObjectMonitor code to reach into the thread to get 
> this lock count adjustment. I understand why you have had to do this but 
> I would much rather see a change to the EA optimisation strategy so that 
> this is not needed.
> 
>> Note also that the property is a straight forward extension of the 
>> existing concept of deferred
>> local updates. It is embedded into the structure holding them. So not 
>> even the footprint of a
>> JavaThread is enlarged if no deferred updates are generated.
>>
>> ?? > ---
>> ?? >
>> ?? > src/hotspot/share/runtime/thread.cpp
>> ?? >
>> ?? > Can you please explain why 
>> JavaThread::wait_for_object_deoptimization
>> ?? > has to be handcrafted in this way rather than using proper 
>> transitions.
>> ?? >
>>
>> I wrote wait_for_object_deoptimization taking 
>> JavaThread::java_suspend_self_with_safepoint_check
>> as template. So in short: for the same reasons :)
>>
>> Threads reach both methods as part of thread state transitions, 
>> therefore special handling is
>> required to change thread state on top of ongoing transitions.
>>
>> ?? > We got rid of "deopt suspend" some time ago and it is disturbing 
>> to see
>> ?? > it being added back (effectively). This seems like it may be 
>> something
>> ?? > that handshakes could be used for.
>>
>> Deopt suspend used to be something rather different with a similar 
>> name[1]. It is not being added back.
> 
> I stand corrected. Despite comments in the code to the contrary 
> deopt_suspend didn't actually cause a self-suspend. I was doing a lot of 
> cleanup in this area 13 years ago :)
> 
>>
>> I'm actually duplicating the existing external suspend mechanism, 
>> because a thread can be suspended
>> at most once. And hey, and don't like that either! But it seems not 
>> unlikely that the duplicate can
>> be removed together with the original and the new type of handshakes 
>> that will be used for
>> thread suspend can be used for object deoptimization too. See today's 
>> discussion in JDK-8227745 [2].
> 
> I hope that discussion bears some fruit, at the moment it seems not to 
> be possible to use handshakes here. :(
> 
> The external suspend mechanism is a royal pain in the proverbial that we 
> have to carefully live with. The idea that we're duplicating that for 
> use in another fringe area of functionality does not thrill me at all.
> 
> To be clear, I understand the problem that exists and that you wish to 
> solve, but for the runtime parts I balk at the complexity cost of 
> solving it.
> 
> Thanks,
> David
> -----
> 
>> Thanks, Richard.
>>
>> [1] Deopt suspend was something like an async. handshake for 
>> architectures with register windows,
>> ???? where patching the return pc for deoptimization of a compiled 
>> frame was racy if the owner thread
>> ???? was in native code. Instead a "deopt" suspend flag was set on 
>> which the thread patched its own
>> ???? frame upon return from native. So no thread was suspended. It got 
>> its name only from the name of
>> ???? the flags.
>>
>> [2] Discussion about using handshakes to sync. with the target thread:
>>      
>> https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727 
>>
>>
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Freitag, 13. Dezember 2019 00:56
>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>> serviceability-dev at openjdk.java.net; 
>> hotspot-compiler-dev at openjdk.java.net; 
>> hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better 
>> Performance in the Presence of JVMTI Agents
>>
>> Hi Richard,
>>
>> Some further queries/concerns:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp
>>
>> Can you please explain the changes to ObjectMonitor::wait:
>>
>> !?? _recursions = save????? // restore the old recursion count
>> !???????????????? + jt->get_and_reset_relock_count_after_wait(); //
>> increased by the deferred relock count
>>
>> what is the "deferred relock count"? I gather it relates to
>>
>> "The code was extended to be able to deoptimize objects of a frame that
>> is not the top frame and to let another thread than the owning thread do
>> it."
>>
>> which I don't like the sound of at all when it comes to ObjectMonitor
>> state. So I'd like to understand in detail exactly what is going on here
>> and why.? This is a very intrusive change that seems to badly break
>> encapsulation and impacts future changes to ObjectMonitor that are under
>> investigation.
>>
>> ---
>>
>> src/hotspot/share/runtime/thread.cpp
>>
>> Can you please explain why JavaThread::wait_for_object_deoptimization
>> has to be handcrafted in this way rather than using proper transitions.
>>
>> We got rid of "deopt suspend" some time ago and it is disturbing to see
>> it being added back (effectively). This seems like it may be something
>> that handshakes could be used for.
>>
>> Thanks,
>> David
>> -----
>>
>> On 12/12/2019 7:02 am, David Holmes wrote:
>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
>>>> Hi David,
>>>>
>>>> ??? > Most of the details here are in areas I can comment on in detail,
>>>> but I
>>>> ??? > did take an initial general look at things.
>>>>
>>>> Thanks for taking the time!
>>>
>>> Apologies the above should read:
>>>
>>> "Most of the details here are in areas I *can't* comment on in detail 
>>> ..."
>>>
>>> David
>>>
>>>> ??? > The only thing that jumped out at me is that I think the
>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread.
>>>> ??? >
>>>> ??? > +? bool is_hidden_from_external_view() const { return true; }
>>>>
>>>> Yes, it should. Will add the method like above.
>>>>
>>>> ??? > Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>> Without
>>>> ??? > active testing this will just bit-rot.
>>>>
>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>> workload. I will add a minimal test
>>>> to keep it fresh.
>>>>
>>>> ??? > Also on the tests I don't understand your @requires clause:
>>>> ??? >
>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>> ??? > (vm.opt.TieredCompilation != true))
>>>> ??? >
>>>> ??? > This seems to require that TieredCompilation is disabled, but
>>>> tiered is
>>>> ??? > our normal mode of operation. ??
>>>> ??? >
>>>>
>>>> I removed the clause. I guess I wanted to target the tests towards the
>>>> code they are supposed to
>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>> with just one compiler thread.
>>>>
>>>> Additionally I will make use of
>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>> serviceability-dev at openjdk.java.net;
>>>> hotspot-compiler-dev at openjdk.java.net;
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>> Performance in the Presence of JVMTI Agents
>>>>
>>>> Hi Richard,
>>>>
>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>> Hi,
>>>>>
>>>>> I would like to get reviews please for
>>>>>
>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>
>>>>> Corresponding RFE:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>
>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>>
>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
>>>>> issues (thanks!). In addition the
>>>>> change is being tested at SAP since I posted the first RFR some
>>>>> months ago.
>>>>>
>>>>> The intention of this enhancement is to benefit performance wise from
>>>>> escape analysis even if JVMTI
>>>>> agents request capabilities that allow them to access local variable
>>>>> values. E.g. if you start-up
>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
>>>>> escape analysis is disabled right
>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>> should do so. With the
>>>>> enhancement, escape analysis will remain enabled until and after a
>>>>> debugger attaches. EA based
>>>>> optimizations are reverted just before an agent acquires the
>>>>> reference to an object. In the JBS item
>>>>> you'll find more details.
>>>>
>>>> Most of the details here are in areas I can comment on in detail, but I
>>>> did take an initial general look at things.
>>>>
>>>> The only thing that jumped out at me is that I think the
>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>
>>>> +? bool is_hidden_from_external_view() const { return true; }
>>>>
>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. 
>>>> Without
>>>> active testing this will just bit-rot.
>>>>
>>>> Also on the tests I don't understand your @requires clause:
>>>>
>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>> (vm.opt.TieredCompilation != true))
>>>>
>>>> This seems to require that TieredCompilation is disabled, but tiered is
>>>> our normal mode of operation. ??
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch 
>>>>>
>>>>>
>>>>>

From vladimir.x.ivanov at oracle.com  Tue Dec 17 15:02:38 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 17 Dec 2019 18:02:38 +0300
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <1A324EE8-6965-4F76-B55B-D16184DE2ED9@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
 <1A324EE8-6965-4F76-B55B-D16184DE2ED9@oracle.com>
Message-ID: <ae0bbade-ea27-4287-fe40-2bc2c6c125da@oracle.com>

Hi Jesper,

> This is a fairly large (wide spread) change. Is there any risk for conflicts with remaining work in JDK 14?
> In the interest of keeping forwardports as conflict free as possible, would it make sense to hold this change until the number of changes in 14 has dropped?

I consider the risk of merge conflicts as low, since the bulk of changes 
touch C1 (and it isn't usually changed much).

But I'm fine with waiting until the rate of fixes in JDK 14 drops (after 
RDP2?).

Best regards,
Vladimir Ivanov

>> On 17 Dec 2019, at 13:50, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-7175279
>>
>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base.
>>
>> Proposed patch removes the rest and makes x86-64 code x87-free.
>>
>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2].
>>
>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1().
>>
>> x87 instructions are made available only on x86-32.
>>
>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only.
>>
>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
>>
>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
>>
>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp
> 

From jesper.wilhelmsson at oracle.com  Tue Dec 17 15:38:55 2019
From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson)
Date: Tue, 17 Dec 2019 16:38:55 +0100
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <ae0bbade-ea27-4287-fe40-2bc2c6c125da@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
 <1A324EE8-6965-4F76-B55B-D16184DE2ED9@oracle.com>
 <ae0bbade-ea27-4287-fe40-2bc2c6c125da@oracle.com>
Message-ID: <722D0395-9B89-490C-A6C4-5604AFFDA94B@oracle.com>


> On 17 Dec 2019, at 16:02, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Hi Jesper,
> 
>> This is a fairly large (wide spread) change. Is there any risk for conflicts with remaining work in JDK 14?
>> In the interest of keeping forwardports as conflict free as possible, would it make sense to hold this change until the number of changes in 14 has dropped?
> 
> I consider the risk of merge conflicts as low, since the bulk of changes touch C1 (and it isn't usually changed much).
> 
> But I'm fine with waiting until the rate of fixes in JDK 14 drops (after RDP2?).

Yes, once we enter RDP2 there shouldn't be many changes left for JDK 14.
Thank you!
/Jesper

> 
> Best regards,
> Vladimir Ivanov
> 
>>> On 17 Dec 2019, at 13:50, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>> 
>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-7175279
>>> 
>>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base.
>>> 
>>> Proposed patch removes the rest and makes x86-64 code x87-free.
>>> 
>>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2].
>>> 
>>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1().
>>> 
>>> x87 instructions are made available only on x86-32.
>>> 
>>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only.
>>> 
>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
>>> 
>>> Best regards,
>>> Vladimir Ivanov
>>> 
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
>>> 
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
>>> 
>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp


From adinn at redhat.com  Tue Dec 17 16:39:41 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Tue, 17 Dec 2019 16:39:41 +0000
Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK
 due to long offset
In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com>
Message-ID: <e7672858-3f6f-996b-3f56-4a17f5eb21a0@redhat.com>

On 12/12/2019 19:14, Andrew Haley wrote:
> In a rather belt-and-braces way I've also added some code that fixes
> up illegal addresses. I did consider removing it, but I left it in
> because it doesn't hurt. If we ever do generate similar illegal
> addresses, debug builds will assert. I'm not sure whether to keep this
> or not.
> 
> Andrew Dinn will probably have a cow when he sees this patch. :-)
> 
> OK for HEAD?
Well, that wasn't quite so painful as it sounded.

Code Review:

The repeated warning comments are a very good idea.

I spotted two things in aarch64.ad:

1) Range rule:

7115 // Load Range
7116 instruct loadRange(iRegINoSp dst, memory8 mem)
7117 %{
7118   match(Set dst (LoadRange mem));
7119
7120   ins_cost(4 * INSN_COST);
7121   format %{ "ldrw  $dst, $mem\t# range" %}
7122
7123   ins_encode(aarch64_enc_ldrw(dst, mem));
7124
7125   ins_pipe(iload_reg_mem);
7126 %}

I think that should be memory4? Also, should you maybe change the comment to

  // Load Range (32 bit signed)


2) Pop count rules

The following rule uses 'memory4' to declare the memory op but passes
'sizeof (jfloat)' to the ldrs call. Would the latter not be better
passed as 4? (n.b. you use the matching numeric literal as argument to
loadStore in the encoding class definitions).

8161 instruct popCountI_mem(iRegINoSp dst, memory4 mem, vRegF tmp)
%{8162   predicate(UsePopCountInstruction);
8163   match(Set dst (PopCountI (LoadI mem)));
8164   effect(TEMP tmp);
8165   ins_cost(INSN_COST * 13);
8166
8167   format %{ "ldrs   $tmp, $mem\n\t"
8168             "cnt    $tmp, $tmp\t# vector (8B)\n\t"
8169             "addv   $tmp, $tmp\t# vector (8B)\n\t"
8170             "mov    $dst, $tmp\t# vector (1D)" %}
8171   ins_encode %{
8172     FloatRegister tmp_reg = as_FloatRegister($tmp$$reg);
8173     loadStore(MacroAssembler(&cbuf), &MacroAssembler::ldrs,
tmp_reg, $mem->opcode(),
8174               as_Register($mem$$base), $mem$$index, $mem$$scale,
$mem$$disp, sizeof (jfloat));
8175     __ cnt($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister);
8176     __ addv($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister);
8177     __ mov($dst$$Register, $tmp$$FloatRegister, __ T1D, 0);
8178   %}
  . . .

The same question applied for the next definition

8204 instruct popCountL_mem(iRegINoSp dst, memory8 mem, vRegD tmp) %{
  . . .

Yes, I too do not like the look of legitimize_address and it ought not
to be necessary, given that the assert in debug mode ought to stop us
hitting this in product mode. Still belt and braces is always a good
thing so I'm happy for to stay (and do no harm).

Testing:

Was there a rationale for picking those specific offsets for the
accesses? Anyway, I'm assuming the test didn't crash the JVM ;-)

So, the only real issue is the size in the Range rule. I guess you
didn't trigger a case where that mattered. Modulo that it is good to push.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From vladimir.kozlov at oracle.com  Tue Dec 17 17:45:18 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Dec 2019 09:45:18 -0800
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
Message-ID: <e8603491-fa06-446d-5b13-08cc33fda384@oracle.com>

Finally! Very good cleanup. Few notes.

c1_CodeStubs.hpp - I think it should be stronger than assert to catch it in product too (we can do check in product 
because it is not performance critical code).

c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we use AMD64:

https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456

Thanks,
Vladimir

On 12/17/19 4:50 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-7175279
> 
> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 
> code base.
> 
> Proposed patch removes the rest and makes x86-64 code x87-free.
> 
> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for 
> JEP 306 [1] and related enhancements [2].
> 
> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses 
> StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1().
> 
> x87 instructions are made available only on x86-32.
> 
> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] 
> x86-32-only.
> 
> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
> 
> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
> 
> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp

From jatin.bhateja at intel.com  Tue Dec 17 18:58:38 2019
From: jatin.bhateja at intel.com (Bhateja, Jatin)
Date: Tue, 17 Dec 2019 18:58:38 +0000
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
In-Reply-To: <f5581f66-7f31-f7a7-8097-44c2342453ad@oracle.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
 <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0D9641@BGSMSX106.gar.corp.intel.com>
 <f9b62995-48a8-1dcf-468e-a0c53d3ee83c@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0DA24F@BGSMSX106.gar.corp.intel.com>
 <f5581f66-7f31-f7a7-8097-44c2342453ad@oracle.com>
Message-ID: <A66BBE673E08E1428E3A918AE4D5B32C0A0DCC93@BGSMSX106.gar.corp.intel.com>

Hi Tobias,

Please find updated patch at following link.

http://cr.openjdk.java.net/~jbhateja/8230185/webrev.03/

Thanks,
Jatin

> -----Original Message-----
> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> Sent: Tuesday, December 17, 2019 1:10 PM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>
> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
> 
> Hi Jatin,
> 
> On 16.12.19 16:42, Bhateja, Jatin wrote:
> > Please find below the updated patch with the test case.
> >
> > http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/
> 
> Thanks for adding the test. Some comments:
> - We try to avoid bug ids in test names. I would suggest a more descriptive
> name like "TestIrreducibleLoopWithVNNI".
> - Please also add the test to package compiler.loopopts
> - For Java code, we use 4 whitespace indentation
> - The variable 'c' is not used in 'mainTest'
> 
> Thanks,
> Tobias

From john.r.rose at oracle.com  Tue Dec 17 22:00:09 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 17 Dec 2019 14:00:09 -0800
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <c68bedd4-ef37-e330-e004-cc147a71a497@oracle.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
 <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
 <CAG3_yeVgfPTc0OnEguqTjoFLPAfXTy1kngp2F8d+XwaSvGGRwg@mail.gmail.com>
 <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com>
 <c68bedd4-ef37-e330-e004-cc147a71a497@oracle.com>
Message-ID: <C1824E1D-4DAC-416A-8F57-8315141CADD2@oracle.com>

On Dec 13, 2019, at 2:44 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Also, I'd like to add that it's fine to fix it in incrementally: start with something simple and reliable, and then explore more complex extensions on top of it.

+1

> I don't think it's necessary to  bytecode analysis too far. I agree with you that some point it becomes simpler to just parse the method and observe the effects than trying to derive them directly from bytecode.

The reason I point out ciTypeFlow is that it already parses the bytecodes.
It does this mainly to build a CFG, but given the framework it is easy to add
other small chores.  I see counting instructions, by kind, as a small chore.

Another place where such counting could be added is the construction of
ciMethodData.  It seems reasonable to me to associate static metrics with
a method at the same time as setting up dynamic metric collection.

Jason noted JDK-8056071 as an example of bytecode instruction scanning.
The fix in that bug is to extend the specialized recognizers of bytecodes.*pp
to detect ?tiny methods?.  If I had to choose between adding more of those
recognizers and adding extra chores to the ciMethodData or ciTypeFlow passes,
I?d prefer to do the latter, because the instruction classification is already
written and debugged for the full passes.

The recognizers in bytecodes.*pp always bothered me a little, that they are
thrown off by tiny variations in the code.  A counting technique that looks
at all instructions is more robust, since it can more easily discount simple
instructions like data movement and constant materialization.

> (I fully agree that it requires significant effort to enable IR-based analysis in C2. But also there are ways to workaround that and cache the analysis results across compilations: gather data during stand-alone compilation and then reuse it when doing inlining.)

I?d like to mine out more of the potential in adding those ?extra chores?
to existing passes before doing IR caching.  In fact, I see IR caching as
one of those big investments that we should do in C2?s Java-based
successor.  Caching is a near cousin to serialization, and that is much
more easily done on reflective Java data than on C++ data.  (Or maybe
I missed your point?)  The C2 IR is not garbage collected, but rather
thread-confined and deleted wholesale after each compilation task.
Sharing across compilation tasks is a hard problem, as we found when
we built the CI.

? John

From john.r.rose at oracle.com  Tue Dec 17 22:13:50 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 17 Dec 2019 14:13:50 -0800
Subject: RFR: 8234863: Increase default value of MaxInlineLevel
In-Reply-To: <CAG3_yeXG4pA9rWQLT-2-tnkDAyuLYJ9RcKyhLjKMy2DZN23smg@mail.gmail.com>
References: <bfaf5862-153e-5fc0-67d9-36751b95f238@oracle.com>
 <CAG3_yeV0fy3neSczDyY9UX=KCZ3A9vV_W1vymzbkSaciMUneuA@mail.gmail.com>
 <d59e8151-2fc1-056b-de1d-ed927a7b81dc@oracle.com>
 <CAG3_yeVgfPTc0OnEguqTjoFLPAfXTy1kngp2F8d+XwaSvGGRwg@mail.gmail.com>
 <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com>
 <CAG3_yeXG4pA9rWQLT-2-tnkDAyuLYJ9RcKyhLjKMy2DZN23smg@mail.gmail.com>
Message-ID: <059C0946-7C30-43C1-BBD8-AE496360C2A1@oracle.com>

On Dec 17, 2019, at 5:15 AM, Jason Zaugg <jzaugg at gmail.com> wrote:
> 
> Thanks for the encouragement. I've submitted an OCA and will work on a
> patch.

Very good!

> I'm past some initial tooling issues -- I can build an image and run jtreg.. I've
> added [1] a simple version of the analysis to the existing the flow analysis and
> hooked this into InlineTree::try_to_inline. I've added a new test in the same
> manner as inlining/InlineAccessors.java.

This is on the right track.  Try to find the simple wins, of course.
And please see my previous messing in this thread to Vladimir.

One place to consider piggy-backing the counting techniques
in ciTypeFlow is in apply_one_bytecode.  (Counts should
only be accumulated the first time through each block.)
The tricky part would be factoring the code so that the
counting logic would not obscure the type flowing logic,
but I think this is doable.  The reason I like this possibility
is that ciTypeFlow ignores some unreached instructions.

That potentially covers some cases of assertion code which
is turned off, and exception processing code which has never
(yet) been called, and would force a deoptimization if it
were called into service.  These cases are notorious for
harming inlining.  If we could say ?this is a tiny method,
except for assertion and exception processing code that
never runs?, we could build a more robust inlining policy.

Something similar might be doable with MethodData,
earlier in execution; I haven?t thought it through.  Perhaps
MethodData could associate instruction kind counts with
reachable points in the bytecode, which other clients,
including ciTypeFlow, could make use of.  I think ciTF
is a better cut point.

> I'll flesh this out and report back after Christmas.

Nice; let?s see where it goes.  Despite all my musings about
abstract interpretation, do start small.  Even small is hard.

? John

From vladimir.kozlov at oracle.com  Wed Dec 18 00:09:10 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Dec 2019 16:09:10 -0800
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
In-Reply-To: <A66BBE673E08E1428E3A918AE4D5B32C0A0DCC93@BGSMSX106.gar.corp.intel.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
 <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0D9641@BGSMSX106.gar.corp.intel.com>
 <f9b62995-48a8-1dcf-468e-a0c53d3ee83c@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0DA24F@BGSMSX106.gar.corp.intel.com>
 <f5581f66-7f31-f7a7-8097-44c2342453ad@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0DCC93@BGSMSX106.gar.corp.intel.com>
Message-ID: <8edd881e-259a-c479-b9ac-0afb99d8511e@oracle.com>

This looks good to me. I will leave final review and sponsoring to Tobias.

Thanks,
Vladimir

On 12/17/19 10:58 AM, Bhateja, Jatin wrote:
> Hi Tobias,
> 
> Please find updated patch at following link.
> 
> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.03/
> 
> Thanks,
> Jatin
> 
>> -----Original Message-----
>> From: Tobias Hartmann <tobias.hartmann at oracle.com>
>> Sent: Tuesday, December 17, 2019 1:10 PM
>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov
>> <vladimir.kozlov at oracle.com>
>> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
>>
>> Hi Jatin,
>>
>> On 16.12.19 16:42, Bhateja, Jatin wrote:
>>> Please find below the updated patch with the test case.
>>>
>>> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/
>>
>> Thanks for adding the test. Some comments:
>> - We try to avoid bug ids in test names. I would suggest a more descriptive
>> name like "TestIrreducibleLoopWithVNNI".
>> - Please also add the test to package compiler.loopopts
>> - For Java code, we use 4 whitespace indentation
>> - The variable 'c' is not used in 'mainTest'
>>
>> Thanks,
>> Tobias

From vladimir.kozlov at oracle.com  Wed Dec 18 01:06:30 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Dec 2019 17:06:30 -0800
Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate
 nodes
In-Reply-To: <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com>
References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>
 <d44e28db-19d9-b00b-8a53-40e7e934d4be@oracle.com>
 <b5913422-d9b6-5e13-e1c9-15be352e317a@oracle.com>
 <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com>
 <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com>
Message-ID: <3bc591f0-6041-2b95-bcb2-26a00c190922@oracle.com>

On 12/16/19 2:29 PM, Vladimir Ivanov wrote:
> Hi Vladimir,
> 
> Updated version:
>  ? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/

Good.

> 
> What's changed:
>  ? * Added more comments
>  ? * Fixed missing cases (Repl4B_imm() and Repl8B_imm)
>  ??? * Double-checked that there are no other missing cases left:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt
> 
> I'd like to reiterate that I deliberately don't want to spend too much time polishing the current version because I 
> consider it as an interim point and not as the optimal and desired one (though it's definitely much better than where we 
> started both in number of instructions and clarity).

I was about to comment changes for Long values but I agree that what is dine is enough.

> 
> The shape I want to see in the next couple of iterations is the following:
> 
>  ? (1) all the operation implementations encapsulated in MacroAssembler
>  ??? * CPU dispatching will happen there, not in AD file;

Yes, we discussed it before - asm instructions selection should be done in MacroAssembler (it is here exactly for that 
purpose).

> 
>  ? (2) get rid of vec vs legVec separation as much as possible
>  ??? * one of the ways to fix it is to introduce additional operand types (for example, vecBW == {legVec when avx512() 
> && !avx512bw(); vec, otherwise})

It would be nice.

> 
> It would turn current ReplB_reg & ReplB_reg_leg into:
> 
> instruct ReplB_reg(vecBW dst, scalar src) %{
>  ? match(Set dst (ReplicateB src));
>  ? format %{ "replicate $dst,$src" %}
>  ? ins_encode %{
>  ??? uint vlen = vector_length(this);
>  ??? __ replicate_byte($dst$$XMMRegister, $src$$Register, vlen);
>  ? %}
> %}
> 
> where MacroAssembler::replicate_byte() hides all the dispatching logic against CPU capabilities and vector length.
> 
> Moreover, it opens additional merging opportunities. As an example:
> 
> instruct ReplBS_reg(vecBW dst, rRegI src) %{
>  ? match(Set dst (ReplicateB src));
>  ? match(Set dst (ReplicateS src));
>  ? format %{ "replicate $dst,$src" %}
>  ? ins_encode %{
>  ??? uint vlen = vector_length(this);
>  ??? switch (ideal_Opcode()) {
>  ?????? case Op_ReplicateB: __ replicate_byte($dst$$XMMRegister, $src$$Register, vlen); break;
>  ?????? case Op_ReplicateS: __ replicate_short($dst$$XMMRegister, $src$$Register, vlen); break;
>  ?????? default: ShouldNotReachHere();
>  ??? }
>  ? %}
>  ? ins_pipe( pipe_slow );
> %}
> 
> If we agree that this is the direction we want to move, splitting instructions is counter-productive and just pushes 
> more work for later iterations. Same applies to dispatching logic on CPU & vector length.

Yes, I agree.

Thanks,
Vladimir

> 
> Best regards,
> Vladimir Ivanov
> 
> On 14.12.2019 01:18, Vladimir Kozlov wrote:
>> On 12/13/19 2:27 AM, Vladimir Ivanov wrote:
>>> Thanks for the feedback, Vladimir.
>>>
>>>> replicateB
>>>>
>>>> Can you fold it differently?
>>>>
>>>> ReplB_reg_leg
>>>
>>> Are you talking about ReplB_reg?
>>
>> Yes, I was talking about ReplB_reg. I thought we can combine all [8-64] length vectors but I missed that ReplB_reg_leg 
>> uses legVec and needs separate instructions :(
>>
>> And I wanted to separate instruction which use avx 512 (evpbroadcastb) because it is difficult to see relation between 
>> predicate condition (length() == 64 && VM_Version::supports_avx512bw()) and check in code (vlen == 64 || 
>> VM_Version::supports_avx512vlbw()). First, || vs &&. Second, avx512bw vs avx512vlbw. May be better to have a separate 
>> instruction for this.
>>
>>>
>>>> ?? predicate(!VM_Version::supports_avx512vlbw());
>>>
>>> 3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>>> 3152?? predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw());
>>>
>>> For ReplB_reg_leg the predicate can't be simplified: it is applicable only to 512bit vector when AVX512BW is absent. 
>>> Otherwise, legVec constraint will be unnecessarily applied to other configurations.
>>
>> That is why you replaced !avx512vlbw with !avx512bw?
>> May be this section of code need comment which explains why one or an other is used.
>>
>>>
>>> 3119 instruct ReplB_reg(vec dst, rRegI src) %{
>>> 3120?? predicate((n->as_Vector()->length() <= 32) ||
>>> 3121???????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));
>>>
>>> For ReplB_reg there's a shorter version:
>>>
>>> predicate(n->as_Vector()->length() <= 32 || VM_Version::supports_avx512bw());
>>>
>>>
>>> But do you find it easier to read? For example, when you are checking that all configurations are covered:
>>>
>>> predicate(n->as_Vector()->length() <= 32 ||
>>> ?????????? VM_Version::supports_avx512bw());
>>>
>>> instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>>> ?? predicate(n->as_Vector()->length() == 64 &&
>>> ??????????? !VM_Version::supports_avx512bw());
>>>
>>> vs
>>>
>>> instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>>> ?? predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw());
>>>
>>> instruct ReplB_reg(vec dst, rRegI src) %{
>>> ?? predicate((n->as_Vector()->length() <= 32) ||
>>> ???????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));
>>
>> I think next conditions in predicates require comment about using avx512vlbw and avx512bw.
>>
>> !?? predicate((n->as_Vector()->length() <= 32 && VM_Version::supports_avx512vlbw()) ||
>> !???????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));
>>
>>>
>>>
>>>> ?? ins_encode %{
>>>> ???? uint vlen = vector_length(this);
>>>> ???? __ movdl($dst$$XMMRegister, $src$$Register);
>>>> ???? __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister);
>>>> ???? __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00);
>>>> ???? if (vlen > 8) {
>>>> ?????? __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister);
>>>> ?????? if (vlen > 16) {
>>>> ???????? __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister);
>>>> ???????? if (vlen > 32) {
>>>> ?????????? assert(vlen == 64, "sanity");
>>>> ?????????? __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, $dst$$XMMRegister, 0x1);
>>>
>>> Yes, it should work as well. Do you find it easier to read though?
>>
>> Code is smaller.
>>
>>>
>>>> Similar ReplB_imm_leg for which I don't see new implementation.
>>>
>>> Good catch. Added it back.
>>>
>>> (FTR completeness for reg2reg variants (_reg*) is mandatory. But for _mem and _imm it is optional: if they some 
>>> configuration isn't covered, _reg is used. But it was in original code, so I added it back.)
>>
>> I don't see code which was in Repl4B_imm() and Repl8B_imm() (only movdl and movq without vpbroadcastb).
>>
>>>
>>> Updated version:
>>> ?? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00
>>
>> Should be webrev.01
>>
>>>
>>> Another thing I noticed is that for ReplI/.../ReplD cases avx512vl checks are not necessary:
>>>
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/
>>>
>>> The code assumes that evpbroadcastd/evpbroadcastq, vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need 
>>> AVX512VL for 512bit case, but Intel manual says AVX512F is enough.
>>>
>>> I plan to handle it as a separate change, but let me know if you want to incorporate it into 8235825.
>>
>> Yes, lets do it separately.
>>
>>>
>>>> It should also simplify code for avx512 which one or 2 instructions.
>>>
>>> Can you elaborate, please? Are you talking about the case when version for different vector sizes differ in 1-2 
>>> instructions (like ReplB_reg)?
>> No, I was talking about cases when evpbroadcastb and vpbroadcastb instructions are used. I was think to have them in 
>> separate instructions. In your latest version it would be only evpbroadcastb case from ReplB_reg().
>>
>> Thanks,
>> Vladimir
>>
>>>
>>>> Other types changes can be done same way.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>> On 12/12/19 3:19 AM, Vladimir Ivanov wrote:
>>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8235825
>>>>>
>>>>> Merge AD instructions for the following vector nodes:
>>>>> ?? - ReplicateB, ..., ReplicateD
>>>>>
>>>>> Individual patches:
>>>>>
>>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual
>>>>>
>>>>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX)
>>>>>
>>>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>>>> Reviewed-by: vlivanov, sviswanathan, ?
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov

From john.r.rose at oracle.com  Wed Dec 18 01:40:17 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 17 Dec 2019 17:40:17 -0800
Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate
 nodes
In-Reply-To: <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com>
References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>
 <d44e28db-19d9-b00b-8a53-40e7e934d4be@oracle.com>
 <b5913422-d9b6-5e13-e1c9-15be352e317a@oracle.com>
 <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com>
 <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com>
Message-ID: <B461D489-9E5C-4CDA-965D-7A2A692465DD@oracle.com>

Reviewed again.  It reads well.  I agree about future steps.  ? John

On Dec 16, 2019, at 2:29 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Updated version:
>  http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/ <http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/>
> 
> What's changed:
>  * Added more comments
>  * Fixed missing cases (Repl4B_imm() and Repl8B_imm)
>    * Double-checked that there are no other missing cases left:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt <http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt>


From vladimir.kozlov at oracle.com  Wed Dec 18 02:40:08 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Dec 2019 18:40:08 -0800
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>

CCing to Runtime group.

For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
I would definitively miss to do that if I use stringStreams in some new code.

May be someone can suggest some C++ trick to do that automatically.

Thanks,
Vladimir

On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
> Hi,
> 
> I'm resending this with fixed bugId ...
> Sorry!
> 
> Best regards,
>    Goetz
> 
>> Hi,
>>
>> PrintInlining and TraceLoopPredicate allocate stringStreams with new and
>> relied on the fact that all memory used is on the ResourceArea cleaned
>> after the compilation.
>>
>> Since 8224193 the char* of the stringStream is malloced and thus
>> must be freed. No doing so manifests a memory leak.
>> This is only relevant if the corresponding tracing is active.
>>
>> To fix TraceLoopPredicate I added the destructor call
>> Fixing PrintInlining is a bit more complicated, as it uses several
>> stringStreams. A row of them is in a GrowableArray which must
>> be walked to free all of them.
>> As the GrowableArray is on an arena no destructor is called for it.
>>
>> I also changed some as_string() calls to base() calls which reduced
>> memory need of the traces, and added a comment explaining the
>> constructor of GrowableArray that calls the copyconstructor for its
>> elements.
>>
>> Please review:
>> http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/01/
>>
>> Best regards,
>>    Goetz.
> 

From vladimir.kozlov at oracle.com  Wed Dec 18 03:18:35 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Dec 2019 19:18:35 -0800
Subject: [14] RFR (S) 8236000: VM build without C2 fails
Message-ID: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com>

https://cr.openjdk.java.net/~kvn/8236000/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8236000

C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2 found other issues 
which were fixed too:

#  Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403
#  assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked

and

# Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118
# assert(DerivedPointerTable::is_active()) failed: Sanity

Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci
and tier1.

Thanks,
Vladimir

From david.holmes at oracle.com  Wed Dec 18 03:33:26 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 18 Dec 2019 13:33:26 +1000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
Message-ID: <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>

On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
> CCing to Runtime group.
> 
> For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
> I would definitively miss to do that if I use stringStreams in some new 
> code.

But that is not a problem added by this changeset, the problem is that 
we're not deallocating these stringStreams even though we should be.  If 
you use a stringStream in new code you have to manage its lifecycle.

That said why is this:

  if (_print_inlining_stream != NULL) 
_print_inlining_stream->~stringStream();

not just:

delete _print_inlining_stream;

?

Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly 
calling the destructor rather than calling delete?

Cheers,
David

> May be someone can suggest some C++ trick to do that automatically.
> Thanks,
> Vladimir
> 
> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> I'm resending this with fixed bugId ...
>> Sorry!
>>
>> Best regards,
>> ?? Goetz
>>
>>> Hi,
>>>
>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new and
>>> relied on the fact that all memory used is on the ResourceArea cleaned
>>> after the compilation.
>>>
>>> Since 8224193 the char* of the stringStream is malloced and thus
>>> must be freed. No doing so manifests a memory leak.
>>> This is only relevant if the corresponding tracing is active.
>>>
>>> To fix TraceLoopPredicate I added the destructor call
>>> Fixing PrintInlining is a bit more complicated, as it uses several
>>> stringStreams. A row of them is in a GrowableArray which must
>>> be walked to free all of them.
>>> As the GrowableArray is on an arena no destructor is called for it.
>>>
>>> I also changed some as_string() calls to base() calls which reduced
>>> memory need of the traces, and added a comment explaining the
>>> constructor of GrowableArray that calls the copyconstructor for its
>>> elements.
>>>
>>> Please review:
>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/01/
>>>
>>> Best regards,
>>> ?? Goetz.
>>

From vladimir.kozlov at oracle.com  Wed Dec 18 03:38:38 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Dec 2019 19:38:38 -0800
Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll
 loops
In-Reply-To: <87o8w8ptsq.fsf@redhat.com>
References: <87o8w8ptsq.fsf@redhat.com>
Message-ID: <da3150a6-8303-894c-be9c-c90298f7de7d@oracle.com>

Very nice!

cfgnode.cpp - should we also check for is_top() to set `doit = false` and bailout?

Why you added check in must_be_not_null()? It is used only in library_call.cpp and should not relate to this changes. 
Did you find some issues?

Thanks,
Vladimir

On 12/16/19 12:18 AM, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8231291/webrev.01/
> 
> As discussed before:
> 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/035094.html
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036171.html
> 
> Fully unrolling loops early helps EA. The change to cfgnode.cpp is
> required because full unroll sometimes needs peeling which may add a phi
> between a memory access and its AddP, a pattern that EA doesn't
> recognize.
> 
> Roland.
> 

From kim.barrett at oracle.com  Wed Dec 18 04:09:16 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 17 Dec 2019 23:09:16 -0500
Subject: [14] RFR (S) 8236000: VM build without C2 fails
In-Reply-To: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com>
References: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com>
Message-ID: <E007B1F8-551C-41C1-9A26-4C64F76EAFD8@oracle.com>

> On Dec 17, 2019, at 10:18 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> https://cr.openjdk.java.net/~kvn/8236000/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8236000
> 
> C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2 found other issues which were fixed too:
> 
> #  Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403
> #  assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked
> 
> and
> 
> # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118
> # assert(DerivedPointerTable::is_active()) failed: Sanity
> 
> Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci
> and tier1.
> 
> Thanks,
> Vladimir

Looks good.


From vladimir.kozlov at oracle.com  Wed Dec 18 04:34:34 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Dec 2019 20:34:34 -0800
Subject: [14] RFR (S) 8236000: VM build without C2 fails
In-Reply-To: <E007B1F8-551C-41C1-9A26-4C64F76EAFD8@oracle.com>
References: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com>
 <E007B1F8-551C-41C1-9A26-4C64F76EAFD8@oracle.com>
Message-ID: <3819390f-c0fe-5752-b52e-9f957ab04830@oracle.com>

Thank you, Kim

Vladimir

On 12/17/19 8:09 PM, Kim Barrett wrote:
>> On Dec 17, 2019, at 10:18 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> https://cr.openjdk.java.net/~kvn/8236000/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8236000
>>
>> C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2 found other issues which were fixed too:
>>
>> #  Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403
>> #  assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked
>>
>> and
>>
>> # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118
>> # assert(DerivedPointerTable::is_active()) failed: Sanity
>>
>> Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci
>> and tier1.
>>
>> Thanks,
>> Vladimir
> 
> Looks good.
> 

From tobias.hartmann at oracle.com  Wed Dec 18 06:59:29 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 18 Dec 2019 07:59:29 +0100
Subject: [14] RFR (S) 8236000: VM build without C2 fails
In-Reply-To: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com>
References: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com>
Message-ID: <c6d8c1ca-a1e1-c052-71e9-8f892a5419dc@oracle.com>

Hi Vladimir,

looks good to me.

Best regards,
Tobias

On 18.12.19 04:18, Vladimir Kozlov wrote:
> https://cr.openjdk.java.net/~kvn/8236000/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8236000
> 
> C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2
> found other issues which were fixed too:
> 
> #? Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403
> #? assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked
> 
> and
> 
> # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118
> # assert(DerivedPointerTable::is_active()) failed: Sanity
> 
> Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci
> and tier1.
> 
> Thanks,
> Vladimir

From tobias.hartmann at oracle.com  Wed Dec 18 07:06:57 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 18 Dec 2019 08:06:57 +0100
Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
In-Reply-To: <8edd881e-259a-c479-b9ac-0afb99d8511e@oracle.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A0D8C50@BGSMSX106.gar.corp.intel.com>
 <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0D9641@BGSMSX106.gar.corp.intel.com>
 <f9b62995-48a8-1dcf-468e-a0c53d3ee83c@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0DA24F@BGSMSX106.gar.corp.intel.com>
 <f5581f66-7f31-f7a7-8097-44c2342453ad@oracle.com>
 <A66BBE673E08E1428E3A918AE4D5B32C0A0DCC93@BGSMSX106.gar.corp.intel.com>
 <8edd881e-259a-c479-b9ac-0afb99d8511e@oracle.com>
Message-ID: <a328077c-8097-0d9a-a2c7-7820c0e726b9@oracle.com>

Looks good to me too. I'll sponsor.

Best regards,
Tobias

On 18.12.19 01:09, Vladimir Kozlov wrote:
> This looks good to me. I will leave final review and sponsoring to Tobias.
> 
> Thanks,
> Vladimir
> 
> On 12/17/19 10:58 AM, Bhateja, Jatin wrote:
>> Hi Tobias,
>>
>> Please find updated patch at following link.
>>
>> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.03/
>>
>> Thanks,
>> Jatin
>>
>>> -----Original Message-----
>>> From: Tobias Hartmann <tobias.hartmann at oracle.com>
>>> Sent: Tuesday, December 17, 2019 1:10 PM
>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>> Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com>
>>> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class
>>>
>>> Hi Jatin,
>>>
>>> On 16.12.19 16:42, Bhateja, Jatin wrote:
>>>> Please find below the updated patch with the test case.
>>>>
>>>> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/
>>>
>>> Thanks for adding the test. Some comments:
>>> - We try to avoid bug ids in test names. I would suggest a more descriptive
>>> name like "TestIrreducibleLoopWithVNNI".
>>> - Please also add the test to package compiler.loopopts
>>> - For Java code, we use 4 whitespace indentation
>>> - The variable 'c' is not used in 'mainTest'
>>>
>>> Thanks,
>>> Tobias

From vladimir.x.ivanov at oracle.com  Wed Dec 18 09:34:18 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 18 Dec 2019 12:34:18 +0300
Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate
 nodes
In-Reply-To: <B461D489-9E5C-4CDA-965D-7A2A692465DD@oracle.com>
References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com>
 <d44e28db-19d9-b00b-8a53-40e7e934d4be@oracle.com>
 <b5913422-d9b6-5e13-e1c9-15be352e317a@oracle.com>
 <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com>
 <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com>
 <B461D489-9E5C-4CDA-965D-7A2A692465DD@oracle.com>
Message-ID: <e33ee799-a085-9e57-3c38-e7604e124122@oracle.com>

Thanks for the reviews, Vladimir and John.

Best regards,
Vladimir Ivanov

On 18.12.2019 04:40, John Rose wrote:
> Reviewed again. ?It reads well. ?I agree about future steps. ?? John
> 
> On Dec 16, 2019, at 2:29 PM, Vladimir Ivanov 
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>>
>> Updated version:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/
>>
>> What's changed:
>> ?* Added more comments
>> ?* Fixed missing cases (Repl4B_imm() and Repl8B_imm)
>> ???* Double-checked that there are no other missing cases left:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt
> 

From vladimir.x.ivanov at oracle.com  Wed Dec 18 10:02:01 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 18 Dec 2019 13:02:01 +0300
Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll
 loops
In-Reply-To: <87o8w8ptsq.fsf@redhat.com>
References: <87o8w8ptsq.fsf@redhat.com>
Message-ID: <3dd0917e-f307-5f22-dd68-786c90ec47a5@oracle.com>

Hi Roland,

> http://cr.openjdk.java.net/~roland/8231291/webrev.01/

As I understand, the intention of the change you propose is to perform 
complete loop unrolling earlier so EA can benefit from it.


Some comments:

src/hotspot/share/opto/loopnode.cpp:

+  if (mode == LoopOptsMaxUnroll) {
+    for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) {
+      IdealLoopTree* lpt = iter.current();
+      if (lpt->is_innermost() && lpt->_allow_optimizations && 
!lpt->_has_call && lpt->is_counted()) {
+        lpt->compute_trip_count(this);
+        if (!lpt->do_one_iteration_loop(this) &&
+            !lpt->do_remove_empty_loop(this)) {
+          AutoNodeBudget node_budget(this);
+          if (lpt->policy_maximally_unroll(this)) {
+            memset( worklist.adr(), 0, worklist.Size()*sizeof(Node*) );
+            do_maximally_unroll(lpt, worklist);
+          }
+        }
+      }
+    }

It looks like LoopOptsMaxUnroll is a shortened version of 
IdealLoopTree::iteration_split/iteration_split_impl().

Have you considered factoring out the common code? Right now, its hard 
to correlate the checks for LoopOptsMaxUnroll with iteration_split() and 
there's a risk they'll diverge eventually.


Do you need the following steps from the original version?

================================================
   // Look for loop-exit tests with my 50/50 guesses from the Parsing stage.
   // Replace with a 1-in-10 exit guess.
   if (!is_root() && is_loop()) {
     adjust_loop_exit_prob(phase);
   }

   // Compute loop trip count from profile data
   compute_profile_trip_cnt(phase);

No use of profiling data since full unrolling is happening anyway?

========================
   if (!cl->is_valid_counted_loop()) return true; // Ignore various 
kinds of broken loops

========================
   // Do nothing special to pre- and post- loops
   if (cl->is_pre_loop() || cl->is_post_loop()) return true;

I assume there are no pre-/post-loops exist at that point, so these 
checks are redundant. Turn them into asserts?

========================
   if (cl->is_normal_loop()) {
     if (policy_unswitching(phase)) {
       phase->do_unswitching(this, old_new);
       return true;
     }
     if (policy_maximally_unroll(phase)) {
       // Here we did some unrolling and peeling.  Eventually we will
       // completely unroll this loop and it will no longer be a loop.
       phase->do_maximally_unroll(this, old_new);
       return true;
     }

You don't perform loop unswitching at all. So, the order of operations 
changes. Do you see any problems with that?

========================


src/hotspot/share/opto/compile.cpp
    // Perform escape analysis
    if (_do_escape_analysis && ConnectionGraph::has_candidates(this)) {
      if (has_loops()) {
        // Cleanup graph (remove dead nodes).
        TracePhase tp("idealLoop", &timers[_t_idealLoop]);
-      PhaseIdealLoop::optimize(igvn, LoopOptsNone);
+      PhaseIdealLoop::optimize(igvn, LoopOptsMaxUnroll);
        if (major_progress()) print_method(PHASE_PHASEIDEAL_BEFORE_EA, 2);
        if (failing())  return;
      }
      ConnectionGraph::do_analysis(this, &igvn);

Does it make sense to do more elaborate checks before performing early 
full loop unrolling? Like whether candidates are used inside loop bodies?

Best regards,
Vladimir Ivanov
> 
> As discussed before:
> 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/035094.html
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036171.html
> 
> Fully unrolling loops early helps EA. The change to cfgnode.cpp is
> required because full unroll sometimes needs peeling which may add a phi
> between a memory access and its AddP, a pattern that EA doesn't
> recognize.
> 
> Roland.
> 

From rkennke at redhat.com  Wed Dec 18 12:40:48 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 18 Dec 2019 13:40:48 +0100
Subject: RFR: 8236181: C2: Remove useless step_over_gc_barrier() in int->bool
 conversion
Message-ID: <e3e4214e-4ee9-99ca-3a8f-2dd789f1713b@redhat.com>

In cfgnode.cpp, in is_x2logic() that converts a diamond-shape if/else to 
simple bool patterns, we have a step_over_gc_barrier() at the end. This 
has been introduced by Shenandoah. I believe the intention was to 
convert obj vs null check to a simple boolean expression and eliminate 
the barrier on the unneeded path. However, it is not needed because 
Shenandoah we already eliminate barriers when the only user is a 
null-check, and it might actually be counter-productive if the barrier 
is needed on other paths, because it keeps the input of the barrier 
alive. This is probably a left-over from pre-LRB.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8236181
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8236181/webrev.00/

Testing: hotspot_gc_shenandoah, submit-repo (in-progress)

Can I please get a review?

Thanks,
Roman


From christian.hagedorn at oracle.com  Wed Dec 18 14:44:15 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 18 Dec 2019 15:44:15 +0100
Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation
In-Reply-To: <DA41BE1DDCA941489001C7FBD7A8820EE3DD87D3@dggeml527-mbx.china.huawei.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE3DD8064@dggeml527-mbx.china.huawei.com>
 <e10a10b9-78ad-5ff7-5b55-571b7feae324@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD80DB@dggeml527-mbx.china.huawei.com>
 <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD8325@dggeml527-mbx.china.huawei.com>
 <f9277177-e8ae-e9b4-950f-4d60555aa20d@oracle.com>
 <DA41BE1DDCA941489001C7FBD7A8820EE3DD87D3@dggeml527-mbx.china.huawei.com>
Message-ID: <9b7da1b1-976d-7c57-7238-e7d38f30930c@oracle.com>

Hi Felix

>    Yes, orig_msize is 0 for the test case in my webrev.
>    For the else case, I find it hard to manually create a test case for it.
>    Another choice is asserting that this else case never happens.  I am not going that way as I haven't got a strong reason for that.

Ok, maybe we could really just assert and bailout if the else case 
happens if you cannot find a test case which covers it? Might be best if 
someone else can comment on that, too, what to do in this case.

Best regards,
Christian

>> Thanks for explaining. Following your analysis with the provided test case,
>> orig_msize is 0 in the end. Can you also provide a test case or show an example
>> which covers the else case in this test:
>>
>>    717             if (orig_msize == 0) {
>>    718               best_align_to_mem_ref =
>> memops.at(max_idx)->as_Mem();
>>    719             } else {
>>    720               for (uint i = 0; i < orig_msize; i++) {
>>    721                 memops.remove(0);
>>    722               }
>>    723               best_align_to_mem_ref = find_align_to_ref(memops,
>> max_idx);
>>    724               assert(best_align_to_mem_ref == NULL, "sanity");
>>    725               best_align_to_mem_ref =
>> memops.at(max_idx)->as_Mem();
>>    726             }

From gromero at linux.vnet.ibm.com  Wed Dec 18 15:34:25 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 18 Dec 2019 12:34:25 -0300
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <AM6PR02MB5078831B6E5AB1A3FA9BB143935A0@AM6PR02MB5078.eurprd02.prod.outlook.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
 <AM6PR02MB5078831B6E5AB1A3FA9BB143935A0@AM6PR02MB5078.eurprd02.prod.outlook.com>
Message-ID: <56adbff7-ed96-5e54-8899-06b0efef212b@linux.vnet.ibm.com>

Hi Matthias,

On 12/11/2019 11:31 AM, Baesken, Matthias wrote:
> Hi Gustavo,   thanks for  posting this .
> I put  your change into our  internal  build+test queue .

Thanks a lot for testing it and catching the fastdebug assert() error. I'll send
a fix to it in v2 in conjunction to Martin's requests.


> We currently do not have something like you described ( a P9 QEMU VM (emulation)   with NVDIMM  support  )  in our test landscape, but
>   It does not hurt to have the patch in our builds/tests anyway ....

The change must work the same as on a POWER9 LPAR w/ vPMEM support, as described
in [1] (try to ignore the sales pitch heh). So in effect it works just like the
POWER9 QEMU VM I've tested: on QEMU it's file-backed on the host side, whilst
vPMEM it's DRAM/DIMM-backed on PowerVM side, and vPMEM is really performant
since it's a real HW.


Best regards,
Gustavo

[1] https://ibmsystemsmag.com/Power-Systems/8/2019/Delivering-Persistence-Performance

From gromero at linux.vnet.ibm.com  Wed Dec 18 15:45:36 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 18 Dec 2019 12:45:36 -0300
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
 <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com>

Hi Martin,

On 12/11/2019 01:55 PM, Doerr, Martin wrote:
> Hi Gustavo,
> 
> thanks for implementing it. Unfortunately, we can't test it at the moment.

Thanks a lot for the review.


> I have a few change requests:
> 
> 
> macroAssembler_ppc.cpp
> I don't like silently emitting nothing in case !VM_Version::supports_data_cache_line_flush().
> If you want to check for it, I suggest to assert VM_Version::supports_data_cache_line_flush() and avoid generating the stub otherwise (stubGenerator_ppc).

Fixed.


> 
> ppc.ad
> The predicates are redundant and should better get removed (useless evaluation).

oh ... Fixed.


> cacheWBPreSync could use cost 0 for clearity. (The costs don't have any effect because there is no choice for the matcher.)

Fixed.


> stubGenerator_ppc.cpp
> I think checking cmpwi(... is_presync, 1) is ok because the ABI specifies that "bool true" is one Byte with the value 1 and the C calling convention enforces extension to 8 Byte.
> I would have used andi_ + bne to be on the safe side, but I believe your version is ok.

I decided for the safe side as you suggested :)


> Comment "// post sync => emit 'lwsync'" is wrong. We use 'sync'.

Sorry, it was a "thinko" when placing the comment. Indeed, the comment is wrong
and the code is correct. Fixed.

I've also fixed the assert() compilation error on fastdebug accordingly to
Matthias' comments.

Finally I tweaked a bit the 'format' strings in ppc.add to show a better output
on +PrintAssembly. For instance, previously it would print something like:

090     B7: #   out( B7 B8 ) <- in( B6 B7 ) Loop( B7-B7 inner ) Freq: 3.99733
090     MR      R17, R15        // Long->Ptr
094     cache wb [R17]

for the cache writeback. Now:

094     cache writeback, address = [R17]
  

Please find v2 at:

http://cr.openjdk.java.net/~gromero/8234599/v2/


Best regards,
Gustavo

From adityam at microsoft.com  Wed Dec 18 16:44:26 2019
From: adityam at microsoft.com (Aditya Mandaleeka)
Date: Wed, 18 Dec 2019 16:44:26 +0000
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
Message-ID: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>

Hi all,

I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine.

The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers.

Bug: bugs.openjdk.java.net/browse/JDK-8236179
Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev

The rest of this email contains more information about the issue and the analysis that led to the fix.

Thank you,
Aditya Mandaleeka

========
How this was found

As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. 

========
Analysis

Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead):

00000261`9beaf4da 488b4978         mov     rcx,qword ptr [rcx+78h]
00000261`9beaf4de 488b11           mov     rdx,qword ptr [rcx]
00000261`9beaf4e1 488d09           lea     rcx,[rcx]
00000261`9beaf4e4 48898c24b8010000 mov     qword ptr [rsp+1B8h],rcx
00000261`9beaf4ec 488bca           mov     rcx,rdx 
00000261`9beaf4ef 8b9424b8010000   mov     edx,dword ptr [rsp+1B8h]
00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0)
00000261`9beaf500 41ffd2           call    r10

The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid.

I found the following in the IR which seemed suspicious:

   wide_move [Base:[R653|M] Disp: 0|L] [R654|L] 
   leal [Base:[R653|M] Disp: 0|L] [R655|L] 
   move [R654|L] [rcx|L] 
   move [R655|L] [rdx|I] 
   rtcall ShenandoahRuntime::load_reference_barrier_native

As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself.

In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland!

========
Testing done

Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch:
  tier1
  tier2
  hotspot_gc_shenandoah

From vladimir.kozlov at oracle.com  Wed Dec 18 17:40:50 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 18 Dec 2019 09:40:50 -0800
Subject: [14] RFR (S) 8236000: VM build without C2 fails
In-Reply-To: <c6d8c1ca-a1e1-c052-71e9-8f892a5419dc@oracle.com>
References: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com>
 <c6d8c1ca-a1e1-c052-71e9-8f892a5419dc@oracle.com>
Message-ID: <fc3608b4-bc04-d0a6-170b-38b0a210d254@oracle.com>

Thank you, Tobias

Vladimir

On 12/17/19 10:59 PM, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 18.12.19 04:18, Vladimir Kozlov wrote:
>> https://cr.openjdk.java.net/~kvn/8236000/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8236000
>>
>> C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2
>> found other issues which were fixed too:
>>
>> #? Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403
>> #? assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked
>>
>> and
>>
>> # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118
>> # assert(DerivedPointerTable::is_active()) failed: Sanity
>>
>> Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci
>> and tier1.
>>
>> Thanks,
>> Vladimir

From rkennke at redhat.com  Wed Dec 18 18:02:21 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 18 Dec 2019 19:02:21 +0100
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
Message-ID: <a59917cc-2566-46f4-5a2b-8ac5a2884f53@redhat.com>

Thanks, Aditya!

I'll sponsor this change for you, once you've got the necessary reviews.

Thank you for your contribution!
Roman


> Hi all,
> 
> I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine.
> 
> The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers.
> 
> Bug: bugs.openjdk.java.net/browse/JDK-8236179
> Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev
> 
> The rest of this email contains more information about the issue and the analysis that led to the fix.
> 
> Thank you,
> Aditya Mandaleeka
> 
> ========
> How this was found
> 
> As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. 
> 
> ========
> Analysis
> 
> Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead):
> 
> 00000261`9beaf4da 488b4978         mov     rcx,qword ptr [rcx+78h]
> 00000261`9beaf4de 488b11           mov     rdx,qword ptr [rcx]
> 00000261`9beaf4e1 488d09           lea     rcx,[rcx]
> 00000261`9beaf4e4 48898c24b8010000 mov     qword ptr [rsp+1B8h],rcx
> 00000261`9beaf4ec 488bca           mov     rcx,rdx 
> 00000261`9beaf4ef 8b9424b8010000   mov     edx,dword ptr [rsp+1B8h]
> 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0)
> 00000261`9beaf500 41ffd2           call    r10
> 
> The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid.
> 
> I found the following in the IR which seemed suspicious:
> 
>    wide_move [Base:[R653|M] Disp: 0|L] [R654|L] 
>    leal [Base:[R653|M] Disp: 0|L] [R655|L] 
>    move [R654|L] [rcx|L] 
>    move [R655|L] [rdx|I] 
>    rtcall ShenandoahRuntime::load_reference_barrier_native
> 
> As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself.
> 
> In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland!
> 
> ========
> Testing done
> 
> Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch:
>   tier1
>   tier2
>   hotspot_gc_shenandoah
> 


From augustnagro at gmail.com  Wed Dec 18 19:51:54 2019
From: augustnagro at gmail.com (August Nagro)
Date: Wed, 18 Dec 2019 13:51:54 -0600
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
 <877e3x0wji.fsf@oldenburg2.str.redhat.com>
 <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com>
Message-ID: <CAHNb0ZO2bvwnWn338cZ5GmBuOO9GB_LFB__OpxWFSws8V812MA@mail.gmail.com>

John,

Apologies for the late response, school has been busy lately.

I do like the idea of growing by a 'size table' or similar, especially if
the growth rate decreases as the hashmap size increases. Since the
probability of fragmentation increases with the number of resizes, bumping
down the resize factor could be a good solution.

To remove the bounds-checking in C2, I've taken a look at the subop class.
I think I understand the change, but I'm wondering about the best way to
handle a negative hash value.

In fast range the hash & size must be less than 2^32, so that the result of
their multiplication fits in a long. So one way is to do hash * (long)
size, or similar. However, there's no unsigned multiply in Java, so if hash
is negative it is widened to a negative long. One could use
Integer::toUnsignedLong, or just directly `hash & 0xffffffffL`. However,
this complicates the code shape. So I wonder if there is a better way. One
option might be to make a Math::fastRange(int hash, int size) method that
is a hotspot intrinsic and directly uses the unsigned instructions. Would
appreciate some guidance on this.

- August

On Tue, Dec 3, 2019 at 12:06 AM John Rose <john.r.rose at oracle.com> wrote:

> On Nov 18, 2019, at 12:17 PM, Florian Weimer <fweimer at redhat.com> wrote:
>
>
> int bucket = fr(h * M); // M = 0x2357BD or something
>
> or maybe something fast and sloppy like:
>
> int bucket = fr(h + (h << 8));
>
>
> Surely this one works, since fr is the final operation.
> The shift/add is just a mixing step to precondition the input.
>
> Just for the record, I?d like to keep brainstorming a bit more,
> though surely this sort of thing is of limited interest.  So,
> just a little more from me on this.
>
> If we had BITR I?d want to try something like fr(h - bitr(h)).
>
> But what I keep wishing for a good one- or two-cycle instruction
> that will mix all input bits into all the output bits, so that any
> change in one input bit is likely to cause cascading changes in
> many output bits, perhaps even 50% on average.  A pair of AES
> steps is a good example of this.  I think AES would be superior to
> multiply (for mixing) when used on 128 bit payloads or larger, so
> it looks appealing (to me) for vectorizable hashing applications.
> Though it is overkill on scalars, I think it points in promising
> directions for scalars also.
>
>
> or even:
>
> int bucket = fr(h) ^ (h & (N-1));
>
> Does this really work?  I don't think so.
>
>
> Oops, you are right.  Something like it might work, though.
>
> The idea, on paper, is that h & (N-1) is less than N, for any N >=1.
> And if N-1 has a high enough pop-count the information content is close to
> 50% of h (though maybe 50% of the bits are masked out).  The xor of two
> quasi-independent values both less than N is, well, less than 2^(ceil lg
> N),
> not N, which is a bug.  Oops.  There are ways to quickly combine two values
> less than N and reduce the result to less than N:  You do a conditional
> subtract of N if the sum is >= N.
>
> So the tactical area I?m trying to explore here is to take two reduced
> hashes developed in parallel, which depend on different bits of the
> input, and combine them into a single stronger hash (not by ^).
>
> Maybe (I can?t resist hacking at it some more):
>
> int h1 = H1(h), h2 = H2(h);
> int bucket = CCA(h1 - h2, N);
> // where H1 := fr, H2(h) := (h & (N-1))
> // where CCA(x, N) := x + ((x >> 31) & N) // Conditional Compensating Add
>
> In this framework, a better H2 for favoring the low bits of h might be
> H2(h) := ((1<<L)-1) & h, where L = (ceil lg N)-1.  This arguably maximizes
> the number of low bits of h that feed into the final bucket selection,
> while fr (H1) arguably maximizes the number of influential high bits.
>
> I think this kind of perturbation is quite expensive.  Arm's BITR should
> be helpful here.
>
>
> Yes, BITR is a helpful building block.  If I understand you correctly, it
> needs to be combined with other operations, such as multiply, shift, xor,
> etc.,
> and can overcome biases towards high bits or towards low bits that come
> from the simple arithmetic definitions of the other mixing operations.
>
> The hack with CCA(h1 - h2, N) seems competitive with a BITR-based
> mixing step, since H2 can be very simple.
>
> A scalar variant of two AES steps (with xor of a second register or
> constant
> parameter at both stages) would be a better building block for strongly
> mixing bits.
> Or some other shallow linear network with a layer of non-linear S-boxes.
>
> But even though this operation is commonly needed and
> easily implemented in hardware, it's rarely found in CPUs.
>
>
> Yep; the whole cottage industry of building clever mixing functions
> out of hand calculator functions could be sidelined if CPUs gave us good
> cheap mixing primitives out of the box.  The crypto literature is full of
> them, and many are designed to be easy to implement in silicon.
>
> ? John
>
> P.S. I mention AES because I?m familiar with that bit of crypto tech, and
> also because I actually tried it out once on the smhasher quality
> benchmark.
> No surprise in hindsight; it passes the quality tests with just two rounds.
> Given that it is as cheap as multiplication, and handles twice as many
> bits at a time, but requires two steps for full mixing, it would seem to
> be competitive with multiplication as a mixing step.  It has no built-in
> biases towards high or low bits, so that?s an advantage over
> multiplication.
>
> Why two rounds?  The one-round version has flaws, as a hash function,
> which are obvious on inspection of the simple structure of an AES round.
> Not every output bit is data-dependent on every input bit of one round,
> but two rounds swirls them all together.  Are back-to-back AES rounds
> expensive?  Maybe, although that?s how the instructions are designed to
> be used, about 10 of them back to back to do real crypto.
>
>

From rkennke at redhat.com  Wed Dec 18 23:03:30 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 19 Dec 2019 00:03:30 +0100
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
Message-ID: <93a1525a-5560-c9c7-e0ba-be3131d2dda3@redhat.com>

Hi all,

Testing via jdk/submit returned with PASSED.

(btw, your URLs lack the http:// makes it slightler harder to follow them)

Thanks,
Roman


On 12/18/19 5:44 PM, Aditya Mandaleeka wrote:
> Hi all,
> 
> I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine.
> 
> The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers.
> 
> Bug: bugs.openjdk.java.net/browse/JDK-8236179
> Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev
> 
> The rest of this email contains more information about the issue and the analysis that led to the fix.
> 
> Thank you,
> Aditya Mandaleeka
> 
> ========
> How this was found
> 
> As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. 
> 
> ========
> Analysis
> 
> Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead):
> 
> 00000261`9beaf4da 488b4978         mov     rcx,qword ptr [rcx+78h]
> 00000261`9beaf4de 488b11           mov     rdx,qword ptr [rcx]
> 00000261`9beaf4e1 488d09           lea     rcx,[rcx]
> 00000261`9beaf4e4 48898c24b8010000 mov     qword ptr [rsp+1B8h],rcx
> 00000261`9beaf4ec 488bca           mov     rcx,rdx 
> 00000261`9beaf4ef 8b9424b8010000   mov     edx,dword ptr [rsp+1B8h]
> 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0)
> 00000261`9beaf500 41ffd2           call    r10
> 
> The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid.
> 
> I found the following in the IR which seemed suspicious:
> 
>    wide_move [Base:[R653|M] Disp: 0|L] [R654|L] 
>    leal [Base:[R653|M] Disp: 0|L] [R655|L] 
>    move [R654|L] [rcx|L] 
>    move [R655|L] [rdx|I] 
>    rtcall ShenandoahRuntime::load_reference_barrier_native
> 
> As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself.
> 
> In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland!
> 
> ========
> Testing done
> 
> Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch:
>   tier1
>   tier2
>   hotspot_gc_shenandoah
> 


From david.holmes at oracle.com  Thu Dec 19 02:11:59 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 19 Dec 2019 12:11:59 +1000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <DB7PR02MB36127925DB5D6609DDBF96909B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
 <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
 <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>
 <DB7PR02MB36127925DB5D6609DDBF96909B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
Message-ID: <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com>

Hi Richard,

I think my issue is with the way EliminateNestedLocks works so I'm going 
to look into that more deeply.

Thanks for the explanations.

David

On 18/12/2019 12:47 am, Reingruber, Richard wrote:
> Hi David,
> 
>    > >    > Some further queries/concerns:
>    > >    >
>    > >    > src/hotspot/share/runtime/objectMonitor.cpp
>    > >    >
>    > >    > Can you please explain the changes to ObjectMonitor::wait:
>    > >    >
>    > >    > !   _recursions = save      // restore the old recursion count
>    > >    > !                 + jt->get_and_reset_relock_count_after_wait(); //
>    > >    > increased by the deferred relock count
>    > >    >
>    > >    > what is the "deferred relock count"? I gather it relates to
>    > >    >
>    > >    > "The code was extended to be able to deoptimize objects of a
>    > > frame that
>    > >    > is not the top frame and to let another thread than the owning
>    > > thread do
>    > >    > it."
>    > >
>    > > Yes, these relate. Currently EA based optimizations are reverted, when a compiled frame is
>    > > replaced with corresponding interpreter frames. Part of this is relocking objects with eliminated
>    > > locking. New with the enhancement is that we do this also just before object references are
>    > > acquired through JVMTI. In this case we deoptimize also the owning compiled frame C and we
>    > > register deoptimized objects as deferred updates. When control returns to C it gets deoptimized,
>    > > we notice that objects are already deoptimized (reallocated and relocked), so we don't do it again
>    > > (relocking twice would be incorrect of course). Deferred updates are copied into the new
>    > > interpreter frames.
>    > >
>    > > Problem: relocking is not possible if the target thread T is waiting on the monitor that needs to
>    > > be relocked. This happens only with non-local objects with EliminateNestedLocks. Instead relocking
>    > > is deferred until T owns the monitor again. This is what the piece of code above does.
>    >
>    >  Sorry I need some more detail here. How can you wait() on an object
>    >  monitor if the object allocation and/or locking was optimised away? And
>    >  what is a "non-local object" in this context? Isn't EA restricted to
>    >  thread-confined objects?
> 
> "Non-local object" is an object that escapes its thread. The issue I'm addressing with the changes
> in ObjectMonitor::wait are almost unrelated to EA. They are caused by EliminateNestedLocks, where C2
> eliminates recursive locking of an already owned lock. The lock owning object exists on the heap, it
> is locked and you can call wait() on it.
> 
> EliminateLocks is the C2 option that controls lock elimination based on EA.  Both optimizations have
> in common that objects with eliminated locking need to be relocked when deoptimizing a frame,
> i.e. when replacing a compiled frame with equivalent interpreter
> frames. Deoptimization::relock_objects does that job for /all/ eliminated locks in scope. /All/ can
> be a mix of eliminated nested locks and locks of not-escaping objects.
> 
> New with the enhancement: I call relock_objects earlier, just before objects pontentially
> escape. But then later when the owning compiled frame gets deoptimized, I must not do it again:
> 
> See call to EscapeBarrier::objs_are_deoptimized in deoptimization.cpp:
> 
>   373   if ((jvmci_enabled || ((DoEscapeAnalysis || EliminateNestedLocks) && EliminateLocks))
>   374       && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) {
>   375     bool unused;
>   376     eliminate_locks(thread, chunk, realloc_failures, deoptee, exec_mode, unused);
>   377   }
> 
> Now when calling relock_objects early it is quiet possible that I have to relock an object the
> target thread currently waits for. Obviously I cannot relock in this case, instead I chose to
> introduce relock_count_after_wait to JavaThread.
> 
>    >  Is it just that some of the locking gets optimized away e.g.
>    >
>    >  synchronised(obj) {
>    >     synchronised(obj) {
>    >       synchronised(obj) {
>    >         obj.wait();
>    >       }
>    >     }
>    >  }
>    >
>    >  If this is reduced to a form as-if it were a single lock of the monitor
>    >  (due to EA) and the wait() triggers a JVM TI event which leads to the
>    >  escape of "obj" then we need to reconstruct the true lock state, and so
>    >  when the wait() internally unblocks and reacquires the monitor it has to
>    >  set the true recursion count to 3, not the 1 that it appeared to be when
>    >  wait() was initially called. Is that the scenario?
> 
> Kind of... except that the locking is not eliminated due to EA and there is no JVM TI event
> triggered by wait.
> 
> Add
> 
> LocalObject l1 = new LocalObject();
> 
> in front of the synchrnized blocks and assume a JVM TI agent acquires l1. This triggers the code in
> question.
> 
> See that relocking/reallocating is transactional. If it is done then for /all/ objects in scope and it is
> done at most once. It wouldn't be quite so easy to split this in relocking of nested/EA-based
> eliminated locks.
> 
>    >  If so I find this truly awful. Anyone using wait() in a realistic form
>    >  requires a notification and so the object cannot be thread confined. In
> 
> It is not thread confined.
> 
>    >  which case I would strongly argue that upon hitting the wait() the deopt
>    >  should occur unconditionally and so the lock state is correct before we
>    >  wait and so we don't need to mess with the recursion count internally
>    >  when we reacquire the monitor.
>    >
>    > >
>    > >    > which I don't like the sound of at all when it comes to ObjectMonitor
>    > >    > state. So I'd like to understand in detail exactly what is going on here
>    > >    > and why.  This is a very intrusive change that seems to badly break
>    > >    > encapsulation and impacts future changes to ObjectMonitor that are under
>    > >    > investigation.
>    > >
>    > > I would not regard this as breaking encapsulation. Certainly not badly.
>    > >
>    > > I've added a property relock_count_after_wait to JavaThread. The property is well
>    > > encapsulated. Future ObjectMonitor implementations have to deal with recursion too. They are free
>    > > in choosing a way to do that as long as that property is taken into account. This is hardly a
>    > > limitation.
>    >
>    >  I do think this badly breaks encapsulation as you have to add a callout
>    >  from the guts of the ObjectMonitor code to reach into the thread to get
>    >  this lock count adjustment. I understand why you have had to do this but
>    >  I would much rather see a change to the EA optimisation strategy so that
>    >  this is not needed.
>    >
>    > > Note also that the property is a straight forward extension of the existing concept of deferred
>    > > local updates. It is embedded into the structure holding them. So not even the footprint of a
>    > > JavaThread is enlarged if no deferred updates are generated.
>    >
>    > [...]
>    >
>    > >
>    > > I'm actually duplicating the existing external suspend mechanism, because a thread can be
>    > > suspended at most once. And hey, and don't like that either! But it seems not unlikely that the
>    > > duplicate can be removed together with the original and the new type of handshakes that will be
>    > > used for thread suspend can be used for object deoptimization too. See today's discussion in
>    > > JDK-8227745 [2].
>    >
>    >  I hope that discussion bears some fruit, at the moment it seems not to
>    >  be possible to use handshakes here. :(
>    >
>    >  The external suspend mechanism is a royal pain in the proverbial that we
>    >  have to carefully live with. The idea that we're duplicating that for
>    >  use in another fringe area of functionality does not thrill me at all.
>    >
>    >  To be clear, I understand the problem that exists and that you wish to
>    >  solve, but for the runtime parts I balk at the complexity cost of
>    >  solving it.
> 
> I know it's complex, but by far no rocket science.
> 
> Also I find it hard to imagine another fix for JDK-8233915 besides changing the JVM TI specification.
>   
> Thanks, Richard.
> 
> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Dienstag, 17. Dezember 2019 08:03
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; Vladimir Kozlov (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>
> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
> 
> <resend as my mailer crashed during last send>
> 
> David
> 
> On 17/12/2019 4:57 pm, David Holmes wrote:
>> Hi Richard,
>>
>> On 14/12/2019 5:01 am, Reingruber, Richard wrote:
>>> Hi David,
>>>
>>>  ?? > Some further queries/concerns:
>>>  ?? >
>>>  ?? > src/hotspot/share/runtime/objectMonitor.cpp
>>>  ?? >
>>>  ?? > Can you please explain the changes to ObjectMonitor::wait:
>>>  ?? >
>>>  ?? > !?? _recursions = save????? // restore the old recursion count
>>>  ?? > !???????????????? + jt->get_and_reset_relock_count_after_wait(); //
>>>  ?? > increased by the deferred relock count
>>>  ?? >
>>>  ?? > what is the "deferred relock count"? I gather it relates to
>>>  ?? >
>>>  ?? > "The code was extended to be able to deoptimize objects of a
>>> frame that
>>>  ?? > is not the top frame and to let another thread than the owning
>>> thread do
>>>  ?? > it."
>>>
>>> Yes, these relate. Currently EA based optimizations are reverted, when
>>> a compiled frame is replaced
>>> with corresponding interpreter frames. Part of this is relocking
>>> objects with eliminated
>>> locking. New with the enhancement is that we do this also just before
>>> object references are acquired
>>> through JVMTI. In this case we deoptimize also the owning compiled
>>> frame C and we register
>>> deoptimized objects as deferred updates. When control returns to C it
>>> gets deoptimized, we notice
>>> that objects are already deoptimized (reallocated and relocked), so we
>>> don't do it again (relocking
>>> twice would be incorrect of course). Deferred updates are copied into
>>> the new interpreter frames.
>>>
>>> Problem: relocking is not possible if the target thread T is waiting
>>> on the monitor that needs to be
>>> relocked. This happens only with non-local objects with
>>> EliminateNestedLocks. Instead relocking is
>>> deferred until T owns the monitor again. This is what the piece of
>>> code above does.
>>
>> Sorry I need some more detail here. How can you wait() on an object
>> monitor if the object allocation and/or locking was optimised away? And
>> what is a "non-local object" in this context? Isn't EA restricted to
>> thread-confined objects?
>>
>> Is it just that some of the locking gets optimized away e.g.
>>
>> synchronised(obj) {
>>   ? synchronised(obj) {
>>   ??? synchronised(obj) {
>>   ????? obj.wait();
>>   ??? }
>>   ? }
>> }
>>
>> If this is reduced to a form as-if it were a single lock of the monitor
>> (due to EA) and the wait() triggers a JVM TI event which leads to the
>> escape of "obj" then we need to reconstruct the true lock state, and so
>> when the wait() internally unblocks and reacquires the monitor it has to
>> set the true recursion count to 3, not the 1 that it appeared to be when
>> wait() was initially called. Is that the scenario?
>>
>> If so I find this truly awful. Anyone using wait() in a realistic form
>> requires a notification and so the object cannot be thread confined. In
>> which case I would strongly argue that upon hitting the wait() the deopt
>> should occur unconditionally and so the lock state is correct before we
>> wait and so we don't need to mess with the recursion count internally
>> when we reacquire the monitor.
>>
>>>
>>>  ?? > which I don't like the sound of at all when it comes to
>>> ObjectMonitor
>>>  ?? > state. So I'd like to understand in detail exactly what is going
>>> on here
>>>  ?? > and why.? This is a very intrusive change that seems to badly break
>>>  ?? > encapsulation and impacts future changes to ObjectMonitor that
>>> are under
>>>  ?? > investigation.
>>>
>>> I would not regard this as breaking encapsulation. Certainly not badly.
>>>
>>> I've added a property relock_count_after_wait to JavaThread. The
>>> property is well
>>> encapsulated. Future ObjectMonitor implementations have to deal with
>>> recursion too. They are free in
>>> choosing a way to do that as long as that property is taken into
>>> account. This is hardly a
>>> limitation.
>>
>> I do think this badly breaks encapsulation as you have to add a callout
>> from the guts of the ObjectMonitor code to reach into the thread to get
>> this lock count adjustment. I understand why you have had to do this but
>> I would much rather see a change to the EA optimisation strategy so that
>> this is not needed.
>>
>>> Note also that the property is a straight forward extension of the
>>> existing concept of deferred
>>> local updates. It is embedded into the structure holding them. So not
>>> even the footprint of a
>>> JavaThread is enlarged if no deferred updates are generated.
>>>
>>>  ?? > ---
>>>  ?? >
>>>  ?? > src/hotspot/share/runtime/thread.cpp
>>>  ?? >
>>>  ?? > Can you please explain why
>>> JavaThread::wait_for_object_deoptimization
>>>  ?? > has to be handcrafted in this way rather than using proper
>>> transitions.
>>>  ?? >
>>>
>>> I wrote wait_for_object_deoptimization taking
>>> JavaThread::java_suspend_self_with_safepoint_check
>>> as template. So in short: for the same reasons :)
>>>
>>> Threads reach both methods as part of thread state transitions,
>>> therefore special handling is
>>> required to change thread state on top of ongoing transitions.
>>>
>>>  ?? > We got rid of "deopt suspend" some time ago and it is disturbing
>>> to see
>>>  ?? > it being added back (effectively). This seems like it may be
>>> something
>>>  ?? > that handshakes could be used for.
>>>
>>> Deopt suspend used to be something rather different with a similar
>>> name[1]. It is not being added back.
>>
>> I stand corrected. Despite comments in the code to the contrary
>> deopt_suspend didn't actually cause a self-suspend. I was doing a lot of
>> cleanup in this area 13 years ago :)
>>
>>>
>>> I'm actually duplicating the existing external suspend mechanism,
>>> because a thread can be suspended
>>> at most once. And hey, and don't like that either! But it seems not
>>> unlikely that the duplicate can
>>> be removed together with the original and the new type of handshakes
>>> that will be used for
>>> thread suspend can be used for object deoptimization too. See today's
>>> discussion in JDK-8227745 [2].
>>
>> I hope that discussion bears some fruit, at the moment it seems not to
>> be possible to use handshakes here. :(
>>
>> The external suspend mechanism is a royal pain in the proverbial that we
>> have to carefully live with. The idea that we're duplicating that for
>> use in another fringe area of functionality does not thrill me at all.
>>
>> To be clear, I understand the problem that exists and that you wish to
>> solve, but for the runtime parts I balk at the complexity cost of
>> solving it.
>>
>> Thanks,
>> David
>> -----
>>
>>> Thanks, Richard.
>>>
>>> [1] Deopt suspend was something like an async. handshake for
>>> architectures with register windows,
>>>  ???? where patching the return pc for deoptimization of a compiled
>>> frame was racy if the owner thread
>>>  ???? was in native code. Instead a "deopt" suspend flag was set on
>>> which the thread patched its own
>>>  ???? frame upon return from native. So no thread was suspended. It got
>>> its name only from the name of
>>>  ???? the flags.
>>>
>>> [2] Discussion about using handshakes to sync. with the target thread:
>>>       
>>> https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727
>>>
>>>
>>> -----Original Message-----
>>> From: David Holmes <david.holmes at oracle.com>
>>> Sent: Freitag, 13. Dezember 2019 00:56
>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>> serviceability-dev at openjdk.java.net;
>>> hotspot-compiler-dev at openjdk.java.net;
>>> hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>> Performance in the Presence of JVMTI Agents
>>>
>>> Hi Richard,
>>>
>>> Some further queries/concerns:
>>>
>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>
>>> Can you please explain the changes to ObjectMonitor::wait:
>>>
>>> !?? _recursions = save????? // restore the old recursion count
>>> !???????????????? + jt->get_and_reset_relock_count_after_wait(); //
>>> increased by the deferred relock count
>>>
>>> what is the "deferred relock count"? I gather it relates to
>>>
>>> "The code was extended to be able to deoptimize objects of a frame that
>>> is not the top frame and to let another thread than the owning thread do
>>> it."
>>>
>>> which I don't like the sound of at all when it comes to ObjectMonitor
>>> state. So I'd like to understand in detail exactly what is going on here
>>> and why.? This is a very intrusive change that seems to badly break
>>> encapsulation and impacts future changes to ObjectMonitor that are under
>>> investigation.
>>>
>>> ---
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>>
>>> Can you please explain why JavaThread::wait_for_object_deoptimization
>>> has to be handcrafted in this way rather than using proper transitions.
>>>
>>> We got rid of "deopt suspend" some time ago and it is disturbing to see
>>> it being added back (effectively). This seems like it may be something
>>> that handshakes could be used for.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>> On 12/12/2019 7:02 am, David Holmes wrote:
>>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
>>>>> Hi David,
>>>>>
>>>>>  ??? > Most of the details here are in areas I can comment on in detail,
>>>>> but I
>>>>>  ??? > did take an initial general look at things.
>>>>>
>>>>> Thanks for taking the time!
>>>>
>>>> Apologies the above should read:
>>>>
>>>> "Most of the details here are in areas I *can't* comment on in detail
>>>> ..."
>>>>
>>>> David
>>>>
>>>>>  ??? > The only thing that jumped out at me is that I think the
>>>>>  ??? > DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>  ??? >
>>>>>  ??? > +? bool is_hidden_from_external_view() const { return true; }
>>>>>
>>>>> Yes, it should. Will add the method like above.
>>>>>
>>>>>  ??? > Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>>> Without
>>>>>  ??? > active testing this will just bit-rot.
>>>>>
>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>>> workload. I will add a minimal test
>>>>> to keep it fresh.
>>>>>
>>>>>  ??? > Also on the tests I don't understand your @requires clause:
>>>>>  ??? >
>>>>>  ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>>  ??? > (vm.opt.TieredCompilation != true))
>>>>>  ??? >
>>>>>  ??? > This seems to require that TieredCompilation is disabled, but
>>>>> tiered is
>>>>>  ??? > our normal mode of operation. ??
>>>>>  ??? >
>>>>>
>>>>> I removed the clause. I guess I wanted to target the tests towards the
>>>>> code they are supposed to
>>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>>> with just one compiler thread.
>>>>>
>>>>> Additionally I will make use of
>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>>> serviceability-dev at openjdk.java.net;
>>>>> hotspot-compiler-dev at openjdk.java.net;
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>>> Performance in the Presence of JVMTI Agents
>>>>>
>>>>> Hi Richard,
>>>>>
>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I would like to get reviews please for
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>>
>>>>>> Corresponding RFE:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>>
>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]
>>>>>>
>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
>>>>>> issues (thanks!). In addition the
>>>>>> change is being tested at SAP since I posted the first RFR some
>>>>>> months ago.
>>>>>>
>>>>>> The intention of this enhancement is to benefit performance wise from
>>>>>> escape analysis even if JVMTI
>>>>>> agents request capabilities that allow them to access local variable
>>>>>> values. E.g. if you start-up
>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
>>>>>> escape analysis is disabled right
>>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>>> should do so. With the
>>>>>> enhancement, escape analysis will remain enabled until and after a
>>>>>> debugger attaches. EA based
>>>>>> optimizations are reverted just before an agent acquires the
>>>>>> reference to an object. In the JBS item
>>>>>> you'll find more details.
>>>>>
>>>>> Most of the details here are in areas I can comment on in detail, but I
>>>>> did take an initial general look at things.
>>>>>
>>>>> The only thing that jumped out at me is that I think the
>>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>
>>>>> +? bool is_hidden_from_external_view() const { return true; }
>>>>>
>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>>> Without
>>>>> active testing this will just bit-rot.
>>>>>
>>>>> Also on the tests I don't understand your @requires clause:
>>>>>
>>>>>  ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
>>>>> (vm.opt.TieredCompilation != true))
>>>>>
>>>>> This seems to require that TieredCompilation is disabled, but tiered is
>>>>> our normal mode of operation. ??
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch
>>>>>>
>>>>>>
>>>>>>

From smita.kamath at intel.com  Thu Dec 19 02:33:08 2019
From: smita.kamath at intel.com (Kamath, Smita)
Date: Thu, 19 Dec 2019 02:33:08 +0000
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
Message-ID: <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>

Hi Vladimir,

I have made the code changes you suggested (please look at the email below).
I have also enabled the intrinsic to run only when VBMI2 feature is available.
The intrinsic shows gains of >1.5x above 4k bit BigInteger.

Webrev link: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/

Thanks,
Smita

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Wednesday, December 11, 2019 10:55 AM
To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64

Hi Kamath,

First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2.
What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.

Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?

Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing.

In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. 
For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. 
Smita >>>done

I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset.
Smita >> removed UseVBMI2 flag

In vm_version_x86.cpp you need to add more %s in print statement for new output.
Smita  >>> done

You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.
Smita >>> done

You need to add intrinsics to Graal's test to ignore them:

http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416
Smita >>>done

Thanks,
Vladimir

On 12/10/19 5:41 PM, Kamath, Smita wrote:
> Hi,
> 
> 
> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.
> 
> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
> 
> Link to webrev : 
> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
> 
> 
> 
> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
> 
> 
> [1] 
> https://software.intel.com/sites/default/files/managed/39/c5/325462-sd
> m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. 
> 2C 5-471)
> 
> [2] 
> https://software.intel.com/en-us/articles/intel-software-development-e
> mulator
> 
> 
> Regards,
> 
> Smita Kamath
> 

From goetz.lindenmaier at sap.com  Thu Dec 19 09:27:27 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 19 Dec 2019 09:27:27 +0000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
Message-ID: <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi David, Vladimir,

stringStream is a ResourceObj, thus it lives on an arena. 
This is uncritical, as it does not resize.
8224193 only changed the allocation of the internal char*,
which always caused problems with resizing under 
ResourceMarks that were not placed for the string but to 
free other memory.
Thus stringStream must not be deallocated, and 
also there was no mem leak before that change.
But we need to call the destructor to free the char*.

Best regards,
  Goetz.

> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net>
> On Behalf Of David Holmes
> Sent: Mittwoch, 18. Dezember 2019 04:33
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net
> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> dev at openjdk.java.net>
> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> '8224193: stringStream should not use Resouce Area'.
> 
> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
> > CCing to Runtime group.
> >
> > For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
> > I would definitively miss to do that if I use stringStreams in some new
> > code.
> 
> But that is not a problem added by this changeset, the problem is that
> we're not deallocating these stringStreams even though we should be.  If
> you use a stringStream in new code you have to manage its lifecycle.
> 
> That said why is this:
> 
>   if (_print_inlining_stream != NULL)
> _print_inlining_stream->~stringStream();
> 
> not just:
> 
> delete _print_inlining_stream;
> 
> ?
> 
> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly
> calling the destructor rather than calling delete?
> 
> Cheers,
> David
> 
> > May be someone can suggest some C++ trick to do that automatically.
> > Thanks,
> > Vladimir
> >
> > On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
> >> Hi,
> >>
> >> I'm resending this with fixed bugId ...
> >> Sorry!
> >>
> >> Best regards,
> >> ?? Goetz
> >>
> >>> Hi,
> >>>
> >>> PrintInlining and TraceLoopPredicate allocate stringStreams with new and
> >>> relied on the fact that all memory used is on the ResourceArea cleaned
> >>> after the compilation.
> >>>
> >>> Since 8224193 the char* of the stringStream is malloced and thus
> >>> must be freed. No doing so manifests a memory leak.
> >>> This is only relevant if the corresponding tracing is active.
> >>>
> >>> To fix TraceLoopPredicate I added the destructor call
> >>> Fixing PrintInlining is a bit more complicated, as it uses several
> >>> stringStreams. A row of them is in a GrowableArray which must
> >>> be walked to free all of them.
> >>> As the GrowableArray is on an arena no destructor is called for it.
> >>>
> >>> I also changed some as_string() calls to base() calls which reduced
> >>> memory need of the traces, and added a comment explaining the
> >>> constructor of GrowableArray that calls the copyconstructor for its
> >>> elements.
> >>>
> >>> Please review:
> >>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
> c2_tracing_mem_leak/01/
> >>>
> >>> Best regards,
> >>> ?? Goetz.
> >>

From aph at redhat.com  Thu Dec 19 10:31:44 2019
From: aph at redhat.com (Andrew Haley)
Date: Thu, 19 Dec 2019 11:31:44 +0100
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
 <877e3x0wji.fsf@oldenburg2.str.redhat.com>
 <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com>
Message-ID: <2ca24471-a5e8-580a-66bc-37b0eece8898@redhat.com>

On 12/3/19 6:06 AM, John Rose wrote:
> But what I keep wishing for a good one- or two-cycle instruction
> that will mix all input bits into all the output bits, so that any
> change in one input bit is likely to cause cascading changes in
> many output bits, perhaps even 50% on average.  A pair of AES
> steps is a good example of this.

I've searched for something like this too. However, I
experimented with two rounds of AES and I didn't get very good
results. From what I remember, it took at least three or four
rounds to get decent mixing, let alone full avalanche.

Also, the latency of AES instructions tends to be a few cycles
and the latency of moving data from integer to vector registers
is as many as five cycles. So I gave up.

This was a while ago, so probably badly remembered...

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From martin.doerr at sap.com  Thu Dec 19 10:37:39 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 19 Dec 2019 10:37:39 +0000
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
 <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com>
Message-ID: <AM0PR0202MB32973E40DCD788E66AB1037C9A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi Gustavo,

thanks for the update. Looks good.

Please remove the whitespaces between the instructions and '(' in generate_data_cache_writeback_sync() before pushing.
Marked here by 'X':
+     __ andi_X(temp, is_presync, 1);
+     __ bneX(CCR0, SKIP);
+     __ cache_wbsync(false); // post sync => emit 'sync'
+     __ bindX(SKIP);         // pre sync => emit nothing

Best regards,
Martin


> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Mittwoch, 18. Dezember 2019 16:46
> To: Doerr, Martin <martin.doerr at sap.com>; Baesken, Matthias
> <matthias.baesken at sap.com>
> Cc: Andrew Dinn <adinn at redhat.com>; hotspot-compiler-
> dev at openjdk.java.net
> Subject: Re: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux
> for JEP-352
> 
> Hi Martin,
> 
> On 12/11/2019 01:55 PM, Doerr, Martin wrote:
> > Hi Gustavo,
> >
> > thanks for implementing it. Unfortunately, we can't test it at the moment.
> 
> Thanks a lot for the review.
> 
> 
> > I have a few change requests:
> >
> >
> > macroAssembler_ppc.cpp
> > I don't like silently emitting nothing in case
> !VM_Version::supports_data_cache_line_flush().
> > If you want to check for it, I suggest to assert
> VM_Version::supports_data_cache_line_flush() and avoid generating the
> stub otherwise (stubGenerator_ppc).
> 
> Fixed.
> 
> 
> >
> > ppc.ad
> > The predicates are redundant and should better get removed (useless
> evaluation).
> 
> oh ... Fixed.
> 
> 
> > cacheWBPreSync could use cost 0 for clearity. (The costs don't have any
> effect because there is no choice for the matcher.)
> 
> Fixed.
> 
> 
> > stubGenerator_ppc.cpp
> > I think checking cmpwi(... is_presync, 1) is ok because the ABI specifies that
> "bool true" is one Byte with the value 1 and the C calling convention enforces
> extension to 8 Byte.
> > I would have used andi_ + bne to be on the safe side, but I believe your
> version is ok.
> 
> I decided for the safe side as you suggested :)
> 
> 
> > Comment "// post sync => emit 'lwsync'" is wrong. We use 'sync'.
> 
> Sorry, it was a "thinko" when placing the comment. Indeed, the comment is
> wrong
> and the code is correct. Fixed.
> 
> I've also fixed the assert() compilation error on fastdebug accordingly to
> Matthias' comments.
> 
> Finally I tweaked a bit the 'format' strings in ppc.add to show a better output
> on +PrintAssembly. For instance, previously it would print something like:
> 
> 090     B7: #   out( B7 B8 ) <- in( B6 B7 ) Loop( B7-B7 inner ) Freq: 3.99733
> 090     MR      R17, R15        // Long->Ptr
> 094     cache wb [R17]
> 
> for the cache writeback. Now:
> 
> 094     cache writeback, address = [R17]
> 
> 
> Please find v2 at:
> 
> http://cr.openjdk.java.net/~gromero/8234599/v2/
> 
> 
> Best regards,
> Gustavo

From david.holmes at oracle.com  Thu Dec 19 10:38:41 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 19 Dec 2019 20:38:41 +1000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <dcf7d995-3705-a086-962d-031fca529176@oracle.com>

On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
> Hi David, Vladimir,
> 
> stringStream is a ResourceObj, thus it lives on an arena.
> This is uncritical, as it does not resize.
> 8224193 only changed the allocation of the internal char*,
> which always caused problems with resizing under
> ResourceMarks that were not placed for the string but to
> free other memory.
> Thus stringStream must not be deallocated, and
> also there was no mem leak before that change.
> But we need to call the destructor to free the char*.

I think we have a confusing mix of arena and C_heap usage with 
stringStream. Not clear to me why stringStream remains a resourceObj 
now? In many cases the stringStream is just local on the stack. In other 
cases if it is new'd then it should be C-heap same as the array and then 
you could delete it too.

What you have may suffice to initially address the leak but I think this 
whole thing needs revisiting.

Thanks,
David

> 
> Best regards,
>    Goetz.
> 
>> -----Original Message-----
>> From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net>
>> On Behalf Of David Holmes
>> Sent: Mittwoch, 18. Dezember 2019 04:33
>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
>> dev at openjdk.java.net
>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>> dev at openjdk.java.net>
>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>> '8224193: stringStream should not use Resouce Area'.
>>
>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
>>> CCing to Runtime group.
>>>
>>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
>>> I would definitively miss to do that if I use stringStreams in some new
>>> code.
>>
>> But that is not a problem added by this changeset, the problem is that
>> we're not deallocating these stringStreams even though we should be.  If
>> you use a stringStream in new code you have to manage its lifecycle.
>>
>> That said why is this:
>>
>>    if (_print_inlining_stream != NULL)
>> _print_inlining_stream->~stringStream();
>>
>> not just:
>>
>> delete _print_inlining_stream;
>>
>> ?
>>
>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly
>> calling the destructor rather than calling delete?
>>
>> Cheers,
>> David
>>
>>> May be someone can suggest some C++ trick to do that automatically.
>>> Thanks,
>>> Vladimir
>>>
>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
>>>> Hi,
>>>>
>>>> I'm resending this with fixed bugId ...
>>>> Sorry!
>>>>
>>>> Best regards,
>>>>  ?? Goetz
>>>>
>>>>> Hi,
>>>>>
>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new and
>>>>> relied on the fact that all memory used is on the ResourceArea cleaned
>>>>> after the compilation.
>>>>>
>>>>> Since 8224193 the char* of the stringStream is malloced and thus
>>>>> must be freed. No doing so manifests a memory leak.
>>>>> This is only relevant if the corresponding tracing is active.
>>>>>
>>>>> To fix TraceLoopPredicate I added the destructor call
>>>>> Fixing PrintInlining is a bit more complicated, as it uses several
>>>>> stringStreams. A row of them is in a GrowableArray which must
>>>>> be walked to free all of them.
>>>>> As the GrowableArray is on an arena no destructor is called for it.
>>>>>
>>>>> I also changed some as_string() calls to base() calls which reduced
>>>>> memory need of the traces, and added a comment explaining the
>>>>> constructor of GrowableArray that calls the copyconstructor for its
>>>>> elements.
>>>>>
>>>>> Please review:
>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
>> c2_tracing_mem_leak/01/
>>>>>
>>>>> Best regards,
>>>>>  ?? Goetz.
>>>>

From goetz.lindenmaier at sap.com  Thu Dec 19 11:35:54 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 19 Dec 2019 11:35:54 +0000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
Message-ID: <AM6PR02MB5347FCBF21E9DA1B1542CDABEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi,

yes, it is confusing that parts are on the arena, other parts
are allocated in the C-heap.  
But usages which allocate the stringStream with new() are
rare, usually it's allocated on the stack making all this
more simple.  And the previous design was even more
error-prone.
Also, the whole way to print the inlining information 
is quite complex, with strange usage of the copy constructor
of PrintInliningBuffer ... which reaches into GrowableArray
which should have a constructor that does not use the
copy constructor to initialize the elements ...

I do not intend to change stringStream in this change.
So can I consider this reviewed from your side? Or at 
least that there is no veto :)?

Thanks and best regards,
  Goetz.


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Donnerstag, 19. Dezember 2019 11:39
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> dev at openjdk.java.net>
> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> '8224193: stringStream should not use Resouce Area'.
> 
> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
> > Hi David, Vladimir,
> >
> > stringStream is a ResourceObj, thus it lives on an arena.
> > This is uncritical, as it does not resize.
> > 8224193 only changed the allocation of the internal char*,
> > which always caused problems with resizing under
> > ResourceMarks that were not placed for the string but to
> > free other memory.
> > Thus stringStream must not be deallocated, and
> > also there was no mem leak before that change.
> > But we need to call the destructor to free the char*.
> 
> I think we have a confusing mix of arena and C_heap usage with
> stringStream. Not clear to me why stringStream remains a resourceObj
> now? In many cases the stringStream is just local on the stack. In other
> cases if it is new'd then it should be C-heap same as the array and then
> you could delete it too.
> 
> What you have may suffice to initially address the leak but I think this
> whole thing needs revisiting.
> 
> Thanks,
> David
> 
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: hotspot-runtime-dev <hotspot-runtime-dev-
> bounces at openjdk.java.net>
> >> On Behalf Of David Holmes
> >> Sent: Mittwoch, 18. Dezember 2019 04:33
> >> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
> >> dev at openjdk.java.net
> >> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> >> dev at openjdk.java.net>
> >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> >> '8224193: stringStream should not use Resouce Area'.
> >>
> >> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
> >>> CCing to Runtime group.
> >>>
> >>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
> >>> I would definitively miss to do that if I use stringStreams in some new
> >>> code.
> >>
> >> But that is not a problem added by this changeset, the problem is that
> >> we're not deallocating these stringStreams even though we should be.  If
> >> you use a stringStream in new code you have to manage its lifecycle.
> >>
> >> That said why is this:
> >>
> >>    if (_print_inlining_stream != NULL)
> >> _print_inlining_stream->~stringStream();
> >>
> >> not just:
> >>
> >> delete _print_inlining_stream;
> >>
> >> ?
> >>
> >> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly
> >> calling the destructor rather than calling delete?
> >>
> >> Cheers,
> >> David
> >>
> >>> May be someone can suggest some C++ trick to do that automatically.
> >>> Thanks,
> >>> Vladimir
> >>>
> >>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
> >>>> Hi,
> >>>>
> >>>> I'm resending this with fixed bugId ...
> >>>> Sorry!
> >>>>
> >>>> Best regards,
> >>>>  ?? Goetz
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new
> and
> >>>>> relied on the fact that all memory used is on the ResourceArea cleaned
> >>>>> after the compilation.
> >>>>>
> >>>>> Since 8224193 the char* of the stringStream is malloced and thus
> >>>>> must be freed. No doing so manifests a memory leak.
> >>>>> This is only relevant if the corresponding tracing is active.
> >>>>>
> >>>>> To fix TraceLoopPredicate I added the destructor call
> >>>>> Fixing PrintInlining is a bit more complicated, as it uses several
> >>>>> stringStreams. A row of them is in a GrowableArray which must
> >>>>> be walked to free all of them.
> >>>>> As the GrowableArray is on an arena no destructor is called for it.
> >>>>>
> >>>>> I also changed some as_string() calls to base() calls which reduced
> >>>>> memory need of the traces, and added a comment explaining the
> >>>>> constructor of GrowableArray that calls the copyconstructor for its
> >>>>> elements.
> >>>>>
> >>>>> Please review:
> >>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
> >> c2_tracing_mem_leak/01/
> >>>>>
> >>>>> Best regards,
> >>>>>  ?? Goetz.
> >>>>

From goetz.lindenmaier at sap.com  Thu Dec 19 11:37:24 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 19 Dec 2019 11:37:24 +0000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
Message-ID: <AM6PR02MB5347D51400AD25FE8960A1CFEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>

One more thing:

I think I should push this to jdk14, right?
It's a P3 bug.

Best regards,
  Goetz

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Donnerstag, 19. Dezember 2019 11:39
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> dev at openjdk.java.net>
> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> '8224193: stringStream should not use Resouce Area'.
> 
> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
> > Hi David, Vladimir,
> >
> > stringStream is a ResourceObj, thus it lives on an arena.
> > This is uncritical, as it does not resize.
> > 8224193 only changed the allocation of the internal char*,
> > which always caused problems with resizing under
> > ResourceMarks that were not placed for the string but to
> > free other memory.
> > Thus stringStream must not be deallocated, and
> > also there was no mem leak before that change.
> > But we need to call the destructor to free the char*.
> 
> I think we have a confusing mix of arena and C_heap usage with
> stringStream. Not clear to me why stringStream remains a resourceObj
> now? In many cases the stringStream is just local on the stack. In other
> cases if it is new'd then it should be C-heap same as the array and then
> you could delete it too.
> 
> What you have may suffice to initially address the leak but I think this
> whole thing needs revisiting.
> 
> Thanks,
> David
> 
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: hotspot-runtime-dev <hotspot-runtime-dev-
> bounces at openjdk.java.net>
> >> On Behalf Of David Holmes
> >> Sent: Mittwoch, 18. Dezember 2019 04:33
> >> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
> >> dev at openjdk.java.net
> >> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> >> dev at openjdk.java.net>
> >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> >> '8224193: stringStream should not use Resouce Area'.
> >>
> >> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
> >>> CCing to Runtime group.
> >>>
> >>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
> >>> I would definitively miss to do that if I use stringStreams in some new
> >>> code.
> >>
> >> But that is not a problem added by this changeset, the problem is that
> >> we're not deallocating these stringStreams even though we should be.  If
> >> you use a stringStream in new code you have to manage its lifecycle.
> >>
> >> That said why is this:
> >>
> >>    if (_print_inlining_stream != NULL)
> >> _print_inlining_stream->~stringStream();
> >>
> >> not just:
> >>
> >> delete _print_inlining_stream;
> >>
> >> ?
> >>
> >> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly
> >> calling the destructor rather than calling delete?
> >>
> >> Cheers,
> >> David
> >>
> >>> May be someone can suggest some C++ trick to do that automatically.
> >>> Thanks,
> >>> Vladimir
> >>>
> >>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
> >>>> Hi,
> >>>>
> >>>> I'm resending this with fixed bugId ...
> >>>> Sorry!
> >>>>
> >>>> Best regards,
> >>>>  ?? Goetz
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new
> and
> >>>>> relied on the fact that all memory used is on the ResourceArea cleaned
> >>>>> after the compilation.
> >>>>>
> >>>>> Since 8224193 the char* of the stringStream is malloced and thus
> >>>>> must be freed. No doing so manifests a memory leak.
> >>>>> This is only relevant if the corresponding tracing is active.
> >>>>>
> >>>>> To fix TraceLoopPredicate I added the destructor call
> >>>>> Fixing PrintInlining is a bit more complicated, as it uses several
> >>>>> stringStreams. A row of them is in a GrowableArray which must
> >>>>> be walked to free all of them.
> >>>>> As the GrowableArray is on an arena no destructor is called for it.
> >>>>>
> >>>>> I also changed some as_string() calls to base() calls which reduced
> >>>>> memory need of the traces, and added a comment explaining the
> >>>>> constructor of GrowableArray that calls the copyconstructor for its
> >>>>> elements.
> >>>>>
> >>>>> Please review:
> >>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
> >> c2_tracing_mem_leak/01/
> >>>>>
> >>>>> Best regards,
> >>>>>  ?? Goetz.
> >>>>

From vladimir.x.ivanov at oracle.com  Thu Dec 19 11:38:46 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 19 Dec 2019 14:38:46 +0300
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <a59917cc-2566-46f4-5a2b-8ac5a2884f53@redhat.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <a59917cc-2566-46f4-5a2b-8ac5a2884f53@redhat.com>
Message-ID: <6e19116b-a94e-1634-488c-b1573da0d707@oracle.com>

Aditya,

> I'll sponsor this change for you, once you've got the necessary reviews.

Please, either post the webrev on cr.openjdk.java.net or just include 
the patch inline.

Best regards,
Vladimir Ivanov


> 
> Thank you for your contribution!
> Roman
> 
> 
>> Hi all,
>>
>> I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine.
>>
>> The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers.
>>
>> Bug: bugs.openjdk.java.net/browse/JDK-8236179
>> Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev
>>
>> The rest of this email contains more information about the issue and the analysis that led to the fix.
>>
>> Thank you,
>> Aditya Mandaleeka
>>
>> ========
>> How this was found
>>
>> As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer.
>>
>> ========
>> Analysis
>>
>> Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead):
>>
>> 00000261`9beaf4da 488b4978         mov     rcx,qword ptr [rcx+78h]
>> 00000261`9beaf4de 488b11           mov     rdx,qword ptr [rcx]
>> 00000261`9beaf4e1 488d09           lea     rcx,[rcx]
>> 00000261`9beaf4e4 48898c24b8010000 mov     qword ptr [rsp+1B8h],rcx
>> 00000261`9beaf4ec 488bca           mov     rcx,rdx
>> 00000261`9beaf4ef 8b9424b8010000   mov     edx,dword ptr [rsp+1B8h]
>> 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0)
>> 00000261`9beaf500 41ffd2           call    r10
>>
>> The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid.
>>
>> I found the following in the IR which seemed suspicious:
>>
>>     wide_move [Base:[R653|M] Disp: 0|L] [R654|L]
>>     leal [Base:[R653|M] Disp: 0|L] [R655|L]
>>     move [R654|L] [rcx|L]
>>     move [R655|L] [rdx|I]
>>     rtcall ShenandoahRuntime::load_reference_barrier_native
>>
>> As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself.
>>
>> In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland!
>>
>> ========
>> Testing done
>>
>> Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch:
>>    tier1
>>    tier2
>>    hotspot_gc_shenandoah
>>
> 

From david.holmes at oracle.com  Thu Dec 19 12:52:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 19 Dec 2019 22:52:22 +1000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <AM6PR02MB5347FCBF21E9DA1B1542CDABEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
 <AM6PR02MB5347FCBF21E9DA1B1542CDABEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <a1a9e366-e019-8872-6f16-dfe6b99fe648@oracle.com>

On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote:
> Hi,
> 
> yes, it is confusing that parts are on the arena, other parts
> are allocated in the C-heap.
> But usages which allocate the stringStream with new() are
> rare, usually it's allocated on the stack making all this
> more simple.  And the previous design was even more
> error-prone.
> Also, the whole way to print the inlining information
> is quite complex, with strange usage of the copy constructor
> of PrintInliningBuffer ... which reaches into GrowableArray
> which should have a constructor that does not use the
> copy constructor to initialize the elements ...
> 
> I do not intend to change stringStream in this change.
> So can I consider this reviewed from your side? Or at
> least that there is no veto :)?

Sorry I was trying to convey this is Reviewed, but I do think this needs 
further work in the future.

Thanks,
David

> Thanks and best regards,
>    Goetz.
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Donnerstag, 19. Dezember 2019 11:39
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vladimir Kozlov
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>> dev at openjdk.java.net>
>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>> '8224193: stringStream should not use Resouce Area'.
>>
>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
>>> Hi David, Vladimir,
>>>
>>> stringStream is a ResourceObj, thus it lives on an arena.
>>> This is uncritical, as it does not resize.
>>> 8224193 only changed the allocation of the internal char*,
>>> which always caused problems with resizing under
>>> ResourceMarks that were not placed for the string but to
>>> free other memory.
>>> Thus stringStream must not be deallocated, and
>>> also there was no mem leak before that change.
>>> But we need to call the destructor to free the char*.
>>
>> I think we have a confusing mix of arena and C_heap usage with
>> stringStream. Not clear to me why stringStream remains a resourceObj
>> now? In many cases the stringStream is just local on the stack. In other
>> cases if it is new'd then it should be C-heap same as the array and then
>> you could delete it too.
>>
>> What you have may suffice to initially address the leak but I think this
>> whole thing needs revisiting.
>>
>> Thanks,
>> David
>>
>>>
>>> Best regards,
>>>     Goetz.
>>>
>>>> -----Original Message-----
>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>> bounces at openjdk.java.net>
>>>> On Behalf Of David Holmes
>>>> Sent: Mittwoch, 18. Dezember 2019 04:33
>>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
>>>> dev at openjdk.java.net
>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>>>> dev at openjdk.java.net>
>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>>>> '8224193: stringStream should not use Resouce Area'.
>>>>
>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
>>>>> CCing to Runtime group.
>>>>>
>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
>>>>> I would definitively miss to do that if I use stringStreams in some new
>>>>> code.
>>>>
>>>> But that is not a problem added by this changeset, the problem is that
>>>> we're not deallocating these stringStreams even though we should be.  If
>>>> you use a stringStream in new code you have to manage its lifecycle.
>>>>
>>>> That said why is this:
>>>>
>>>>     if (_print_inlining_stream != NULL)
>>>> _print_inlining_stream->~stringStream();
>>>>
>>>> not just:
>>>>
>>>> delete _print_inlining_stream;
>>>>
>>>> ?
>>>>
>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly
>>>> calling the destructor rather than calling delete?
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> May be someone can suggest some C++ trick to do that automatically.
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm resending this with fixed bugId ...
>>>>>> Sorry!
>>>>>>
>>>>>> Best regards,
>>>>>>   ?? Goetz
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new
>> and
>>>>>>> relied on the fact that all memory used is on the ResourceArea cleaned
>>>>>>> after the compilation.
>>>>>>>
>>>>>>> Since 8224193 the char* of the stringStream is malloced and thus
>>>>>>> must be freed. No doing so manifests a memory leak.
>>>>>>> This is only relevant if the corresponding tracing is active.
>>>>>>>
>>>>>>> To fix TraceLoopPredicate I added the destructor call
>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several
>>>>>>> stringStreams. A row of them is in a GrowableArray which must
>>>>>>> be walked to free all of them.
>>>>>>> As the GrowableArray is on an arena no destructor is called for it.
>>>>>>>
>>>>>>> I also changed some as_string() calls to base() calls which reduced
>>>>>>> memory need of the traces, and added a comment explaining the
>>>>>>> constructor of GrowableArray that calls the copyconstructor for its
>>>>>>> elements.
>>>>>>>
>>>>>>> Please review:
>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
>>>> c2_tracing_mem_leak/01/
>>>>>>>
>>>>>>> Best regards,
>>>>>>>   ?? Goetz.
>>>>>>

From david.holmes at oracle.com  Thu Dec 19 12:54:15 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 19 Dec 2019 22:54:15 +1000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <AM6PR02MB5347D51400AD25FE8960A1CFEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
 <AM6PR02MB5347D51400AD25FE8960A1CFEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <0392412b-93e1-d99d-f919-96c90c895e30@oracle.com>

On 19/12/2019 9:37 pm, Lindenmaier, Goetz wrote:
> One more thing:
> 
> I think I should push this to jdk14, right?
> It's a P3 bug.

Yes this can go to 14 (and will forward port to 15 automatically).

Thanks,
David

> Best regards,
>    Goetz
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Donnerstag, 19. Dezember 2019 11:39
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vladimir Kozlov
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>> dev at openjdk.java.net>
>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>> '8224193: stringStream should not use Resouce Area'.
>>
>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
>>> Hi David, Vladimir,
>>>
>>> stringStream is a ResourceObj, thus it lives on an arena.
>>> This is uncritical, as it does not resize.
>>> 8224193 only changed the allocation of the internal char*,
>>> which always caused problems with resizing under
>>> ResourceMarks that were not placed for the string but to
>>> free other memory.
>>> Thus stringStream must not be deallocated, and
>>> also there was no mem leak before that change.
>>> But we need to call the destructor to free the char*.
>>
>> I think we have a confusing mix of arena and C_heap usage with
>> stringStream. Not clear to me why stringStream remains a resourceObj
>> now? In many cases the stringStream is just local on the stack. In other
>> cases if it is new'd then it should be C-heap same as the array and then
>> you could delete it too.
>>
>> What you have may suffice to initially address the leak but I think this
>> whole thing needs revisiting.
>>
>> Thanks,
>> David
>>
>>>
>>> Best regards,
>>>     Goetz.
>>>
>>>> -----Original Message-----
>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>> bounces at openjdk.java.net>
>>>> On Behalf Of David Holmes
>>>> Sent: Mittwoch, 18. Dezember 2019 04:33
>>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
>>>> dev at openjdk.java.net
>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>>>> dev at openjdk.java.net>
>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>>>> '8224193: stringStream should not use Resouce Area'.
>>>>
>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
>>>>> CCing to Runtime group.
>>>>>
>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
>>>>> I would definitively miss to do that if I use stringStreams in some new
>>>>> code.
>>>>
>>>> But that is not a problem added by this changeset, the problem is that
>>>> we're not deallocating these stringStreams even though we should be.  If
>>>> you use a stringStream in new code you have to manage its lifecycle.
>>>>
>>>> That said why is this:
>>>>
>>>>     if (_print_inlining_stream != NULL)
>>>> _print_inlining_stream->~stringStream();
>>>>
>>>> not just:
>>>>
>>>> delete _print_inlining_stream;
>>>>
>>>> ?
>>>>
>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly
>>>> calling the destructor rather than calling delete?
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> May be someone can suggest some C++ trick to do that automatically.
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm resending this with fixed bugId ...
>>>>>> Sorry!
>>>>>>
>>>>>> Best regards,
>>>>>>   ?? Goetz
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new
>> and
>>>>>>> relied on the fact that all memory used is on the ResourceArea cleaned
>>>>>>> after the compilation.
>>>>>>>
>>>>>>> Since 8224193 the char* of the stringStream is malloced and thus
>>>>>>> must be freed. No doing so manifests a memory leak.
>>>>>>> This is only relevant if the corresponding tracing is active.
>>>>>>>
>>>>>>> To fix TraceLoopPredicate I added the destructor call
>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several
>>>>>>> stringStreams. A row of them is in a GrowableArray which must
>>>>>>> be walked to free all of them.
>>>>>>> As the GrowableArray is on an arena no destructor is called for it.
>>>>>>>
>>>>>>> I also changed some as_string() calls to base() calls which reduced
>>>>>>> memory need of the traces, and added a comment explaining the
>>>>>>> constructor of GrowableArray that calls the copyconstructor for its
>>>>>>> elements.
>>>>>>>
>>>>>>> Please review:
>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
>>>> c2_tracing_mem_leak/01/
>>>>>>>
>>>>>>> Best regards,
>>>>>>>   ?? Goetz.
>>>>>>

From gromero at linux.vnet.ibm.com  Thu Dec 19 13:43:50 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Thu, 19 Dec 2019 10:43:50 -0300
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <AM0PR0202MB32973E40DCD788E66AB1037C9A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
 <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com>
 <AM0PR0202MB32973E40DCD788E66AB1037C9A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <d4220a59-d10f-ac02-4050-6f00e031fd6b@linux.vnet.ibm.com>

Hi Martin,

On 12/19/2019 07:37 AM, Doerr, Martin wrote:
> Hi Gustavo,
> 
> thanks for the update. Looks good.
> 
> Please remove the whitespaces between the instructions and '(' in generate_data_cache_writeback_sync() before pushing.
> Marked here by 'X':
> +     __ andi_X(temp, is_presync, 1);
> +     __ bneX(CCR0, SKIP);
> +     __ cache_wbsync(false); // post sync => emit 'sync'
> +     __ bindX(SKIP);         // pre sync => emit nothing

Just for records, I uploaded v3 without the whitespaces to:

http://cr.openjdk.java.net/~gromero/8234599/v3/

instead of fixing it in place.

Should I wait any test to complete at SAP side?

Thanks a lot.


Best regards,
Gustavo

From adinn at redhat.com  Thu Dec 19 14:05:46 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 19 Dec 2019 14:05:46 +0000
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <d4220a59-d10f-ac02-4050-6f00e031fd6b@linux.vnet.ibm.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
 <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com>
 <AM0PR0202MB32973E40DCD788E66AB1037C9A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <d4220a59-d10f-ac02-4050-6f00e031fd6b@linux.vnet.ibm.com>
Message-ID: <5a48bb64-2ed8-cfaa-0577-7db5200be4a1@redhat.com>

Hi Gustavo,

On 19/12/2019 13:43, Gustavo Romero wrote:
> http://cr.openjdk.java.net/~gromero/8234599/v3/
Disclaimer: I reviewed an early version of this patch offline before
anything arrived on hotspot-compiler-dev.

Just for the record I am happy with this final version. I make no claim
to have understood the ppc-specific aspects correctly but I am happy
that Matthias and Martin will have covered that part of the review
adequately.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From rwestrel at redhat.com  Thu Dec 19 14:14:51 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 19 Dec 2019 15:14:51 +0100
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
Message-ID: <87h81wl7vo.fsf@redhat.com>


Hi Aditya,

AFAIK, it's a requirement that the patch be posted on the openjdk
infrastructure. So here it is:

http://cr.openjdk.java.net/~roland/8236179/webrev.00/

The change looks good to me but it would be good to check whether
architectures other than x86 need a similar change.

Roland.


From gromero at linux.vnet.ibm.com  Thu Dec 19 14:19:30 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Thu, 19 Dec 2019 11:19:30 -0300
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <5a48bb64-2ed8-cfaa-0577-7db5200be4a1@redhat.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
 <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com>
 <AM0PR0202MB32973E40DCD788E66AB1037C9A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <d4220a59-d10f-ac02-4050-6f00e031fd6b@linux.vnet.ibm.com>
 <5a48bb64-2ed8-cfaa-0577-7db5200be4a1@redhat.com>
Message-ID: <5baa6af8-d48c-3a51-f851-f0e27e3eeb79@linux.vnet.ibm.com>

Hi Andrew,

On 12/19/2019 11:05 AM, Andrew Dinn wrote:
> Hi Gustavo,
> 
> On 19/12/2019 13:43, Gustavo Romero wrote:
>> http://cr.openjdk.java.net/~gromero/8234599/v3/
> Disclaimer: I reviewed an early version of this patch offline before
> anything arrived on hotspot-compiler-dev.
> 
> Just for the record I am happy with this final version. I make no claim
> to have understood the ppc-specific aspects correctly but I am happy
> that Matthias and Martin will have covered that part of the review
> adequately.

Thanks for the reviews. Also, thanks a lot for all the discussions and
suggestions on how to test the change on different scenarios on Power, w/ and
w/o pmem support.

Best regards,
Gustavo

From matthias.baesken at sap.com  Thu Dec 19 14:07:34 2019
From: matthias.baesken at sap.com (Baesken, Matthias)
Date: Thu, 19 Dec 2019 14:07:34 +0000
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <d4220a59-d10f-ac02-4050-6f00e031fd6b@linux.vnet.ibm.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
 <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com>
 <AM0PR0202MB32973E40DCD788E66AB1037C9A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <d4220a59-d10f-ac02-4050-6f00e031fd6b@linux.vnet.ibm.com>
Message-ID: <AM6PR02MB5078D6935C2EDF4BAFDC58E593520@AM6PR02MB5078.eurprd02.prod.outlook.com>

Hi Gustavo , please  remove the blank after  __ bind  too, as suggested by Martin .

http://cr.openjdk.java.net/~gromero/8234599/v3/src/hotspot/cpu/ppc/stubGenerator_ppc.cpp.frames.html

3070     __ bind (SKIP);         // pre sync => emit nothing

Otherwise looks good to me .

Thanks, Matthias


> 
> Hi Martin,
> 
> On 12/19/2019 07:37 AM, Doerr, Martin wrote:
> > Hi Gustavo,
> >
> > thanks for the update. Looks good.
> >
> > Please remove the whitespaces between the instructions and '(' in
> generate_data_cache_writeback_sync() before pushing.
> > Marked here by 'X':
> > +     __ andi_X(temp, is_presync, 1);
> > +     __ bneX(CCR0, SKIP);
> > +     __ cache_wbsync(false); // post sync => emit 'sync'
> > +     __ bindX(SKIP);         // pre sync => emit nothing       <--------------------------------------------------------
> 
> Just for records, I uploaded v3 without the whitespaces to:
> 
> http://cr.openjdk.java.net/~gromero/8234599/v3/
> 
> instead of fixing it in place.
> 
> Should I wait any test to complete at SAP side?
> 
> Thanks a lot.
> 
> 
> Best regards,
> Gustavo

From gromero at linux.vnet.ibm.com  Thu Dec 19 14:39:10 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Thu, 19 Dec 2019 11:39:10 -0300
Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for
 JEP-352
In-Reply-To: <AM6PR02MB5078D6935C2EDF4BAFDC58E593520@AM6PR02MB5078.eurprd02.prod.outlook.com>
References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com>
 <AM0PR0202MB329716A2870CDD4BBACCA90E9A5A0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com>
 <AM0PR0202MB32973E40DCD788E66AB1037C9A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <d4220a59-d10f-ac02-4050-6f00e031fd6b@linux.vnet.ibm.com>
 <AM6PR02MB5078D6935C2EDF4BAFDC58E593520@AM6PR02MB5078.eurprd02.prod.outlook.com>
Message-ID: <212d785f-13fa-1c77-26d4-f973d032fb56@linux.vnet.ibm.com>

Hi Matthias,

On 12/19/2019 11:07 AM, Baesken, Matthias wrote:
> Hi Gustavo , please  remove the blank after  __ bind  too, as suggested by Martin .
> 
> http://cr.openjdk.java.net/~gromero/8234599/v3/src/hotspot/cpu/ppc/stubGenerator_ppc.cpp.frames.html
> 
> 3070     __ bind (SKIP);         // pre sync => emit nothing
> 
> Otherwise looks good to me .

Amazing how many times I was able to mess with the spaces.

v4 with Andrew added as Reviewer:

http://cr.openjdk.java.net/~gromero/8234599/v4/


I plan push it to jdk/jdk today.

Thanks!

Best regards,
Gustavo

From augustnagro at gmail.com  Thu Dec 19 14:38:49 2019
From: augustnagro at gmail.com (August Nagro)
Date: Thu, 19 Dec 2019 08:38:49 -0600
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <2ca24471-a5e8-580a-66bc-37b0eece8898@redhat.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
 <877e3x0wji.fsf@oldenburg2.str.redhat.com>
 <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com>
 <2ca24471-a5e8-580a-66bc-37b0eece8898@redhat.com>
Message-ID: <2F5736F4-3B57-4E97-9A0D-9B32B2EDD82A@gmail.com>

One thing I?ve come to realize is that the bit-mixing step in fast-range does not require a high quality hash function. It just needs to distribute the bits so that the probability of falling into the buckets [0, 2^32), [2^32, 2 * 2^32), [2*2^32, 3*2^32), ?, is even. This is why I?m so enthusiastic about fibonacci hashing for this operation, since it costs only a single multiply.

To prove it works, take a look at this sample. The integer gFactor is equal to 2^32 / phi, where phi is the Golden Ratio. Note that hash * gFactor is 32-bit multiplication, and we AND with 0xffffffffL for the unsigned multiply with int size. Set::add returns false if the element is present.

import java.util.HashSet;
import java.util.Set;

class Scratch {
    public static void main(String[] args) {
        int gFactor = -1640531527;
        int size = 1_000;
        int range = 4_000;

        int collisions = 0;
        Set<Integer> set = new HashSet<>();

        for (int hash = 0; hash < range; ++hash) {
            int reduced = (int) (((hash * gFactor) & 0xffffffffL) * size >>> 32);
            if (!set.add(reduced)) collisions++;
        }

        System.out.println("Optimal collisions: " + (range - size));
        System.out.println("Actual collisions: " + collisions);
    }
}
 

> On Dec 19, 2019, at 4:31 AM, Andrew Haley <aph at redhat.com> wrote:
> 
> On 12/3/19 6:06 AM, John Rose wrote:
>> But what I keep wishing for a good one- or two-cycle instruction
>> that will mix all input bits into all the output bits, so that any
>> change in one input bit is likely to cause cascading changes in
>> many output bits, perhaps even 50% on average.  A pair of AES
>> steps is a good example of this.
> 
> I've searched for something like this too. However, I
> experimented with two rounds of AES and I didn't get very good
> results. From what I remember, it took at least three or four
> rounds to get decent mixing, let alone full avalanche.
> 
> Also, the latency of AES instructions tends to be a few cycles
> and the latency of moving data from integer to vector registers
> is as many as five cycles. So I gave up.
> 
> This was a while ago, so probably badly remembered...
> 
> -- 
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> 


From martin.doerr at sap.com  Thu Dec 19 16:31:05 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 19 Dec 2019 16:31:05 +0000
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <87h81wl7vo.fsf@redhat.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
Message-ID: <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi everybody,

thanks for fixing this issue.

I guess it's currently used on some platforms, but I think we should fix it for all platforms. Otherwise it will break when using the parts which were only fixed for x86.

Here's my proposal:
http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/

I'll run tests on more platforms.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Roland Westrelin
> Sent: Donnerstag, 19. Dezember 2019 15:15
> To: Aditya Mandaleeka <adityam at microsoft.com>; hotspot compiler
> <hotspot-compiler-dev at openjdk.java.net>
> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
> Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS
> 
> 
> Hi Aditya,
> 
> AFAIK, it's a requirement that the patch be posted on the openjdk
> infrastructure. So here it is:
> 
> http://cr.openjdk.java.net/~roland/8236179/webrev.00/
> 
> The change looks good to me but it would be good to check whether
> architectures other than x86 need a similar change.
> 
> Roland.


From adityam at microsoft.com  Thu Dec 19 17:37:47 2019
From: adityam at microsoft.com (Aditya Mandaleeka)
Date: Thu, 19 Dec 2019 17:37:47 +0000
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <6e19116b-a94e-1634-488c-b1573da0d707@oracle.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <a59917cc-2566-46f4-5a2b-8ac5a2884f53@redhat.com>
 <6e19116b-a94e-1634-488c-b1573da0d707@oracle.com>
Message-ID: <BL0PR2101MB10899D4E22EADA0C867F6B66AC520@BL0PR2101MB1089.namprd21.prod.outlook.com>

Thanks for the feedback Vladimir. I don't yet have access to cr.openjdk.java.net, but will paste inline diffs in the future. For now, it appears Martin Doerr has posted an updated webrev on cr.openjdk.java.net in another fork of this mail thread.

Thanks,
Aditya

-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Thursday, December 19, 2019 3:39 AM
To: Aditya Mandaleeka <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: Roman Kennke <rkennke at redhat.com>; shenandoah-dev <shenandoah-dev at openjdk.java.net>
Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS

Aditya,

> I'll sponsor this change for you, once you've got the necessary reviews.

Please, either post the webrev on cr.openjdk.java.net or just include the patch inline.

Best regards,
Vladimir Ivanov


> 
> Thank you for your contribution!
> Roman
> 
> 
>> Hi all,
>>
>> I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine.
>>
>> The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers.
>>
>> Bug: bugs.openjdk.java.net/browse/JDK-8236179
>> Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev
>>
>> The rest of this email contains more information about the issue and the analysis that led to the fix.
>>
>> Thank you,
>> Aditya Mandaleeka
>>
>> ========
>> How this was found
>>
>> As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer.
>>
>> ========
>> Analysis
>>
>> Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead):
>>
>> 00000261`9beaf4da 488b4978         mov     rcx,qword ptr [rcx+78h]
>> 00000261`9beaf4de 488b11           mov     rdx,qword ptr [rcx]
>> 00000261`9beaf4e1 488d09           lea     rcx,[rcx]
>> 00000261`9beaf4e4 48898c24b8010000 mov     qword ptr [rsp+1B8h],rcx
>> 00000261`9beaf4ec 488bca           mov     rcx,rdx
>> 00000261`9beaf4ef 8b9424b8010000   mov     edx,dword ptr [rsp+1B8h]
>> 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0)
>> 00000261`9beaf500 41ffd2           call    r10
>>
>> The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid.
>>
>> I found the following in the IR which seemed suspicious:
>>
>>     wide_move [Base:[R653|M] Disp: 0|L] [R654|L]
>>     leal [Base:[R653|M] Disp: 0|L] [R655|L]
>>     move [R654|L] [rcx|L]
>>     move [R655|L] [rdx|I]
>>     rtcall ShenandoahRuntime::load_reference_barrier_native
>>
>> As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself.
>>
>> In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland!
>>
>> ========
>> Testing done
>>
>> Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch:
>>    tier1
>>    tier2
>>    hotspot_gc_shenandoah
>>
> 

From adityam at microsoft.com  Thu Dec 19 17:49:19 2019
From: adityam at microsoft.com (Aditya Mandaleeka)
Date: Thu, 19 Dec 2019 17:49:19 +0000
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <BL0PR2101MB108975BCC2B3097442E5071FAC520@BL0PR2101MB1089.namprd21.prod.outlook.com>

Thanks for updating the other platforms Martin. Those changes look right to me.

-Aditya

-----Original Message-----
From: Doerr, Martin <martin.doerr at sap.com> 
Sent: Thursday, December 19, 2019 8:31 AM
To: Roland Westrelin <rwestrel at redhat.com>; Aditya Mandaleeka <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS

Hi everybody,

thanks for fixing this issue.

I guess it's currently used on some platforms, but I think we should fix it for all platforms. Otherwise it will break when using the parts which were only fixed for x86.

Here's my proposal:
https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~mdoerr%2F8236179_C1_T_ADDRESS%2Fwebrev.01%2F&amp;data=02%7C01%7Cadityam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d784a0dcc1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637123698748787948&amp;sdata=r11YVMnHSLm1Ms1Ipbq4vPDOhIwlrM8fz1QlAl%2BUWGY%3D&amp;reserved=0

I'll run tests on more platforms.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev- 
> bounces at openjdk.java.net> On Behalf Of Roland Westrelin
> Sent: Donnerstag, 19. Dezember 2019 15:15
> To: Aditya Mandaleeka <adityam at microsoft.com>; hotspot compiler 
> <hotspot-compiler-dev at openjdk.java.net>
> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
> Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS
> 
> 
> Hi Aditya,
> 
> AFAIK, it's a requirement that the patch be posted on the openjdk 
> infrastructure. So here it is:
> 
> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open
> jdk.java.net%2F~roland%2F8236179%2Fwebrev.00%2F&amp;data=02%7C01%7Cadi
> tyam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d784a0dcc1%7C72f988bf86f1
> 41af91ab2d7cd011db47%7C1%7C0%7C637123698748787948&amp;sdata=vQ1xR87EjA
> bf%2Bnwscs1c%2BpTqWLfeVODLz%2FleIsdmthU%3D&amp;reserved=0
> 
> The change looks good to me but it would be good to check whether 
> architectures other than x86 need a similar change.
> 
> Roland.


From vladimir.kozlov at oracle.com  Thu Dec 19 18:06:41 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 19 Dec 2019 10:06:41 -0800
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <a1a9e366-e019-8872-6f16-dfe6b99fe648@oracle.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
 <AM6PR02MB5347FCBF21E9DA1B1542CDABEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <a1a9e366-e019-8872-6f16-dfe6b99fe648@oracle.com>
Message-ID: <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com>

Please, file RFE for refactoring stringStream.

Yes, the fix can go into JDK 14.

But before that, I see the same pattern used 3 times in compile.cpp:

+    if (_print_inlining_stream != NULL) _print_inlining_stream->~stringStream();
      _print_inlining_stream = <something>;

Can you use one function for that? Also our coding style requires to put body of 'if' on separate line and use {}.

thanks,
Vladimir

On 12/19/19 4:52 AM, David Holmes wrote:
> On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> yes, it is confusing that parts are on the arena, other parts
>> are allocated in the C-heap.
>> But usages which allocate the stringStream with new() are
>> rare, usually it's allocated on the stack making all this
>> more simple.? And the previous design was even more
>> error-prone.
>> Also, the whole way to print the inlining information
>> is quite complex, with strange usage of the copy constructor
>> of PrintInliningBuffer ... which reaches into GrowableArray
>> which should have a constructor that does not use the
>> copy constructor to initialize the elements ...
>>
>> I do not intend to change stringStream in this change.
>> So can I consider this reviewed from your side? Or at
>> least that there is no veto :)?
> 
> Sorry I was trying to convey this is Reviewed, but I do think this needs further work in the future.
> 
> Thanks,
> David
> 
>> Thanks and best regards,
>> ?? Goetz.
>>
>>
>>> -----Original Message-----
>>> From: David Holmes <david.holmes at oracle.com>
>>> Sent: Donnerstag, 19. Dezember 2019 11:39
>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>>> dev at openjdk.java.net>
>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>>> '8224193: stringStream should not use Resouce Area'.
>>>
>>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
>>>> Hi David, Vladimir,
>>>>
>>>> stringStream is a ResourceObj, thus it lives on an arena.
>>>> This is uncritical, as it does not resize.
>>>> 8224193 only changed the allocation of the internal char*,
>>>> which always caused problems with resizing under
>>>> ResourceMarks that were not placed for the string but to
>>>> free other memory.
>>>> Thus stringStream must not be deallocated, and
>>>> also there was no mem leak before that change.
>>>> But we need to call the destructor to free the char*.
>>>
>>> I think we have a confusing mix of arena and C_heap usage with
>>> stringStream. Not clear to me why stringStream remains a resourceObj
>>> now? In many cases the stringStream is just local on the stack. In other
>>> cases if it is new'd then it should be C-heap same as the array and then
>>> you could delete it too.
>>>
>>> What you have may suffice to initially address the leak but I think this
>>> whole thing needs revisiting.
>>>
>>> Thanks,
>>> David
>>>
>>>>
>>>> Best regards,
>>>> ??? Goetz.
>>>>
>>>>> -----Original Message-----
>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>> bounces at openjdk.java.net>
>>>>> On Behalf Of David Holmes
>>>>> Sent: Mittwoch, 18. Dezember 2019 04:33
>>>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
>>>>> dev at openjdk.java.net
>>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>>>>> dev at openjdk.java.net>
>>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>>>>> '8224193: stringStream should not use Resouce Area'.
>>>>>
>>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
>>>>>> CCing to Runtime group.
>>>>>>
>>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious.
>>>>>> I would definitively miss to do that if I use stringStreams in some new
>>>>>> code.
>>>>>
>>>>> But that is not a problem added by this changeset, the problem is that
>>>>> we're not deallocating these stringStreams even though we should be.? If
>>>>> you use a stringStream in new code you have to manage its lifecycle.
>>>>>
>>>>> That said why is this:
>>>>>
>>>>> ??? if (_print_inlining_stream != NULL)
>>>>> _print_inlining_stream->~stringStream();
>>>>>
>>>>> not just:
>>>>>
>>>>> delete _print_inlining_stream;
>>>>>
>>>>> ?
>>>>>
>>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly
>>>>> calling the destructor rather than calling delete?
>>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>>> May be someone can suggest some C++ trick to do that automatically.
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm resending this with fixed bugId ...
>>>>>>> Sorry!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> ? ?? Goetz
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new
>>> and
>>>>>>>> relied on the fact that all memory used is on the ResourceArea cleaned
>>>>>>>> after the compilation.
>>>>>>>>
>>>>>>>> Since 8224193 the char* of the stringStream is malloced and thus
>>>>>>>> must be freed. No doing so manifests a memory leak.
>>>>>>>> This is only relevant if the corresponding tracing is active.
>>>>>>>>
>>>>>>>> To fix TraceLoopPredicate I added the destructor call
>>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several
>>>>>>>> stringStreams. A row of them is in a GrowableArray which must
>>>>>>>> be walked to free all of them.
>>>>>>>> As the GrowableArray is on an arena no destructor is called for it.
>>>>>>>>
>>>>>>>> I also changed some as_string() calls to base() calls which reduced
>>>>>>>> memory need of the traces, and added a comment explaining the
>>>>>>>> constructor of GrowableArray that calls the copyconstructor for its
>>>>>>>> elements.
>>>>>>>>
>>>>>>>> Please review:
>>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
>>>>> c2_tracing_mem_leak/01/
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> ? ?? Goetz.
>>>>>>>

From vladimir.x.ivanov at oracle.com  Thu Dec 19 18:15:26 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 19 Dec 2019 21:15:26 +0300
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <e8603491-fa06-446d-5b13-08cc33fda384@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
 <e8603491-fa06-446d-5b13-08cc33fda384@oracle.com>
Message-ID: <e67da9e7-b683-98a5-d65f-529b1ad8335e@oracle.com>

Thanks for the feedback, Vladimir.

> c1_CodeStubs.hpp - I think it should be stronger than assert to catch it 
> in product too (we can do check in product because it is not performance 
> critical code).

Do you prefer to see guarantee/fatal instead?

Frankly speaking, even the assert doesn't look warranted enough.
I put it there mainly to validate my own changes. I could have 
introduced ConversionStubs for x86-64 as well, but decided to simplify 
the implementation.

Considering x86-32 is the only consumer, I'd prefer to have 
ConversionStub x86_32-specific instead and completely hide it from other 
platforms, but it requires putting more #ifdefs in shared code which I 
don't like either.

So, if you see a value in having a runtime check in product binaries 
there, I'll put it there.

> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we 
> use AMD64:
> 
> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 

Yes, you are right. Good catch! :-)

Best regards,
Vladimir Ivanov

> On 12/17/19 4:50 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-7175279
>>
>> There was a major rewrite of math intrinsics which in 9 time frame 
>> which almost completely eliminated x87 code in x86-64 code base.
>>
>> Proposed patch removes the rest and makes x86-64 code x87-free.
>>
>> The main motivation for the patch is to completely eliminate 
>> non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] 
>> and related enhancements [2].
>>
>> Most of the changes are in C1, but there is one case in template 
>> interpreter (java_lang_math_abs) which now uses 
>> StubRoutines::x86::double_sign_mask(). It forces its initialization to 
>> be moved to StubRoutines::initialize1().
>>
>> x87 instructions are made available only on x86-32.
>>
>> C1 changes involve removing FPU support on x86-64 and effectively make 
>> x86-specific support in linear scan allocator [2] x86-32-only.
>>
>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
>>
>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
>>
>> [3] 
>> http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp 
>>

From vladimir.kozlov at oracle.com  Thu Dec 19 18:34:25 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 19 Dec 2019 10:34:25 -0800
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <e67da9e7-b683-98a5-d65f-529b1ad8335e@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
 <e8603491-fa06-446d-5b13-08cc33fda384@oracle.com>
 <e67da9e7-b683-98a5-d65f-529b1ad8335e@oracle.com>
Message-ID: <536139cd-53a1-f697-9c83-296f2d311536@oracle.com>

On 12/19/19 10:15 AM, Vladimir Ivanov wrote:
> Thanks for the feedback, Vladimir.
> 
>> c1_CodeStubs.hpp - I think it should be stronger than assert to catch it in product too (we can do check in product 
>> because it is not performance critical code).
> 
> Do you prefer to see guarantee/fatal instead?
> 
> Frankly speaking, even the assert doesn't look warranted enough.
> I put it there mainly to validate my own changes. I could have introduced ConversionStubs for x86-64 as well, but 
> decided to simplify the implementation.
> 
> Considering x86-32 is the only consumer, I'd prefer to have ConversionStub x86_32-specific instead and completely hide 
> it from other platforms, but it requires putting more #ifdefs in shared code which I don't like either.
> 
> So, if you see a value in having a runtime check in product binaries there, I'll put it there.

Make ConversionStub x86_32-specific only if possible. From what I see it is only LIR_OpConvert in c1_LIR.hpp we have to 
deal with. I actually can't see how it could be only 32-specific. Hmm?

Vladimir K

> 
>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we use AMD64:
>>
>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 
> 
> Yes, you are right. Good catch! :-)
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-7175279
>>>
>>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in 
>>> x86-64 code base.
>>>
>>> Proposed patch removes the rest and makes x86-64 code x87-free.
>>>
>>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM 
>>> for JEP 306 [1] and related enhancements [2].
>>>
>>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses 
>>> StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1().
>>>
>>> x87 instructions are made available only on x86-32.
>>>
>>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator 
>>> [2] x86-32-only.
>>>
>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
>>>
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
>>>
>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp

From vladimir.x.ivanov at oracle.com  Thu Dec 19 18:40:46 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 19 Dec 2019 21:40:46 +0300
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <536139cd-53a1-f697-9c83-296f2d311536@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
 <e8603491-fa06-446d-5b13-08cc33fda384@oracle.com>
 <e67da9e7-b683-98a5-d65f-529b1ad8335e@oracle.com>
 <536139cd-53a1-f697-9c83-296f2d311536@oracle.com>
Message-ID: <14878e70-359e-0716-1fb2-cc54bc77f093@oracle.com>


> Make ConversionStub x86_32-specific only if possible. From what I see it 
> is only LIR_OpConvert in c1_LIR.hpp we have to deal with. I actually 
> can't see how it could be only 32-specific. Hmm?

I experimented with it, but it requires #ifdefs in c1_LIR.cpp/hpp which 
I don't like. So, I don't consider it as an option right now.

Best regards,
Vladimir Ivanov

>>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 
>>> we use AMD64:
>>>
>>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 
>>
>>
>> Yes, you are right. Good catch! :-)
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote:
>>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-7175279
>>>>
>>>> There was a major rewrite of math intrinsics which in 9 time frame 
>>>> which almost completely eliminated x87 code in x86-64 code base.
>>>>
>>>> Proposed patch removes the rest and makes x86-64 code x87-free.
>>>>
>>>> The main motivation for the patch is to completely eliminate 
>>>> non-strictfp behaving code in order to prepare the JVM for JEP 306 
>>>> [1] and related enhancements [2].
>>>>
>>>> Most of the changes are in C1, but there is one case in template 
>>>> interpreter (java_lang_math_abs) which now uses 
>>>> StubRoutines::x86::double_sign_mask(). It forces its initialization 
>>>> to be moved to StubRoutines::initialize1().
>>>>
>>>> x87 instructions are made available only on x86-32.
>>>>
>>>> C1 changes involve removing FPU support on x86-64 and effectively 
>>>> make x86-specific support in linear scan allocator [2] x86-32-only.
>>>>
>>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
>>>>
>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
>>>>
>>>> [3] 
>>>> http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp 
>>>>

From vladimir.kozlov at oracle.com  Thu Dec 19 18:52:53 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 19 Dec 2019 10:52:53 -0800
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <14878e70-359e-0716-1fb2-cc54bc77f093@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
 <e8603491-fa06-446d-5b13-08cc33fda384@oracle.com>
 <e67da9e7-b683-98a5-d65f-529b1ad8335e@oracle.com>
 <536139cd-53a1-f697-9c83-296f2d311536@oracle.com>
 <14878e70-359e-0716-1fb2-cc54bc77f093@oracle.com>
Message-ID: <660d8e71-6fcd-3e71-fac3-7a4caa787872@oracle.com>

On 12/19/19 10:40 AM, Vladimir Ivanov wrote:
> 
>> Make ConversionStub x86_32-specific only if possible. From what I see it is only LIR_OpConvert in c1_LIR.hpp we have 
>> to deal with. I actually can't see how it could be only 32-specific. Hmm?
> 
> I experimented with it, but it requires #ifdefs in c1_LIR.cpp/hpp which I don't like. So, I don't consider it as an  > option right now.

Okay, NOT_IA32( ShouldNotReachHere() ) with comment in c1_CodeStubs.hpp should be enough for now.

Vladimir K

> 
> Best regards,
> Vladimir Ivanov
> 
>>>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we use AMD64:
>>>>
>>>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 
>>>
>>>
>>> Yes, you are right. Good catch! :-)
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote:
>>>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-7175279
>>>>>
>>>>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in 
>>>>> x86-64 code base.
>>>>>
>>>>> Proposed patch removes the rest and makes x86-64 code x87-free.
>>>>>
>>>>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM 
>>>>> for JEP 306 [1] and related enhancements [2].
>>>>>
>>>>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses 
>>>>> StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1().
>>>>>
>>>>> x87 instructions are made available only on x86-32.
>>>>>
>>>>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan 
>>>>> allocator [2] x86-32-only.
>>>>>
>>>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
>>>>>
>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
>>>>>
>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp

From igor.veresov at oracle.com  Thu Dec 19 19:59:53 2019
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 19 Dec 2019 11:59:53 -0800
Subject: [14] RFR(M) 8235927: Update Graal
Message-ID: <D3125C9B-37D0-45AC-AEE6-5871748EF533@oracle.com>

JBS: https://bugs.openjdk.java.net/browse/JDK-8235927 <https://bugs.openjdk.java.net/browse/JDK-8235927>
Webrev: http://cr.openjdk.java.net/~iveresov/8235927/webrev/ <http://cr.openjdk.java.net/~iveresov/8235927/webrev/>

Please find the list of changes in the JBS issue.

Thanks,
igor


From ekaterina.pavlova at oracle.com  Thu Dec 19 20:34:35 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Thu, 19 Dec 2019 12:34:35 -0800
Subject: [14] RFR(T/XXS) 8236139: [Graal]
 java/lang/RuntimeTests/exec/LotsOfOutput.java fails with JVMCI enabled
Message-ID: <63b09d2d-d0be-3051-dad7-dc29221eb7eb@oracle.com>

Hi,

please review very trivial fix which returns LotsOfOutput.java test back to Graal specific problem list file.
The test was recently moved from java/lang/Runtime/exec directory to java/lang/RuntimeTests but corresponding
entry in test/jdk/ProblemList-graal.txt was not patched. This is why the test started to fail in latest build.


     JBS: https://bugs.openjdk.java.net/browse/JDK-8236139
  webrev: http://cr.openjdk.java.net/~epavlova//8236139/webrev.00/index.html


thanks,
-katya

From vladimir.kozlov at oracle.com  Thu Dec 19 20:43:39 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 19 Dec 2019 12:43:39 -0800
Subject: [14] RFR(T/XXS) 8236139: [Graal]
 java/lang/RuntimeTests/exec/LotsOfOutput.java fails with JVMCI enabled
In-Reply-To: <63b09d2d-d0be-3051-dad7-dc29221eb7eb@oracle.com>
References: <63b09d2d-d0be-3051-dad7-dc29221eb7eb@oracle.com>
Message-ID: <6a44546d-1933-b8c1-82d1-21eb6c382e88@oracle.com>

Good and trivial.

Thanks,
Vladimir

On 12/19/19 12:34 PM, Ekaterina Pavlova wrote:
> Hi,
> 
> please review very trivial fix which returns LotsOfOutput.java test back to Graal specific problem list file.
> The test was recently moved from java/lang/Runtime/exec directory to java/lang/RuntimeTests but corresponding
> entry in test/jdk/ProblemList-graal.txt was not patched. This is why the test started to fail in latest build.
> 
> 
>  ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8236139
>  ?webrev: http://cr.openjdk.java.net/~epavlova//8236139/webrev.00/index.html
> 
> 
> thanks,
> -katya

From vladimir.kozlov at oracle.com  Thu Dec 19 20:54:02 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 19 Dec 2019 12:54:02 -0800
Subject: [14] RFR(M) 8235927: Update Graal
In-Reply-To: <D3125C9B-37D0-45AC-AEE6-5871748EF533@oracle.com>
References: <D3125C9B-37D0-45AC-AEE6-5871748EF533@oracle.com>
Message-ID: <285e0384-3c94-c0fd-466f-2a0faaa1f3b0@oracle.com>

Looks good. There seem no new testing failures, only old.

Thanks,
Vladimir

On 12/19/19 11:59 AM, Igor Veresov wrote:
> JBS: https://bugs.openjdk.java.net/browse/JDK-8235927 <https://bugs.openjdk.java.net/browse/JDK-8235927>
> Webrev: http://cr.openjdk.java.net/~iveresov/8235927/webrev/ <http://cr.openjdk.java.net/~iveresov/8235927/webrev/>
> 
> Please find the list of changes in the JBS issue.
> 
> Thanks,
> igor
> 
> 
> 

From ekaterina.pavlova at oracle.com  Thu Dec 19 21:41:02 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Thu, 19 Dec 2019 13:41:02 -0800
Subject: [14] RFR(T/XXS) 8236139: [Graal]
 java/lang/RuntimeTests/exec/LotsOfOutput.java fails with JVMCI enabled
In-Reply-To: <6a44546d-1933-b8c1-82d1-21eb6c382e88@oracle.com>
References: <63b09d2d-d0be-3051-dad7-dc29221eb7eb@oracle.com>
 <6a44546d-1933-b8c1-82d1-21eb6c382e88@oracle.com>
Message-ID: <9471ece1-75b9-e6b9-3aa5-9759cad0c001@oracle.com>

Thanks Vladimir for prompt review, integrated.

-katya

On 12/19/19 12:43 PM, Vladimir Kozlov wrote:
> Good and trivial.
> 
> Thanks,
> Vladimir
> 
> On 12/19/19 12:34 PM, Ekaterina Pavlova wrote:
>> Hi,
>>
>> please review very trivial fix which returns LotsOfOutput.java test back to Graal specific problem list file.
>> The test was recently moved from java/lang/Runtime/exec directory to java/lang/RuntimeTests but corresponding
>> entry in test/jdk/ProblemList-graal.txt was not patched. This is why the test started to fail in latest build.
>>
>>
>> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8236139
>> ??webrev: http://cr.openjdk.java.net/~epavlova//8236139/webrev.00/index.html
>>
>>
>> thanks,
>> -katya


From vladimir.x.ivanov at oracle.com  Thu Dec 19 21:58:19 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 20 Dec 2019 00:58:19 +0300
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <660d8e71-6fcd-3e71-fac3-7a4caa787872@oracle.com>
References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com>
 <e8603491-fa06-446d-5b13-08cc33fda384@oracle.com>
 <e67da9e7-b683-98a5-d65f-529b1ad8335e@oracle.com>
 <536139cd-53a1-f697-9c83-296f2d311536@oracle.com>
 <14878e70-359e-0716-1fb2-cc54bc77f093@oracle.com>
 <660d8e71-6fcd-3e71-fac3-7a4caa787872@oracle.com>
Message-ID: <0b0897e0-1dbc-306d-b2cb-31de13fb8b34@oracle.com>


>>> Make ConversionStub x86_32-specific only if possible. From what I see 
>>> it is only LIR_OpConvert in c1_LIR.hpp we have to deal with. I 
>>> actually can't see how it could be only 32-specific. Hmm?
>>
>> I experimented with it, but it requires #ifdefs in c1_LIR.cpp/hpp 
>> which I don't like. So, I don't consider it as an? > option right now.
> 
> Okay, NOT_IA32( ShouldNotReachHere() ) with comment in c1_CodeStubs.hpp 
> should be enough for now.

Incremental diff:
   http://cr.openjdk.java.net/~vlivanov/7175279/webrev.01-00/

Best regards,
Vladimir Ivanov

>>>>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit 
>>>>> x86 we use AMD64:
>>>>>
>>>>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 
>>>>
>>>>
>>>>
>>>> Yes, you are right. Good catch! :-)
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote:
>>>>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
>>>>>> https://bugs.openjdk.java.net/browse/JDK-7175279
>>>>>>
>>>>>> There was a major rewrite of math intrinsics which in 9 time frame 
>>>>>> which almost completely eliminated x87 code in x86-64 code base.
>>>>>>
>>>>>> Proposed patch removes the rest and makes x86-64 code x87-free.
>>>>>>
>>>>>> The main motivation for the patch is to completely eliminate 
>>>>>> non-strictfp behaving code in order to prepare the JVM for JEP 306 
>>>>>> [1] and related enhancements [2].
>>>>>>
>>>>>> Most of the changes are in C1, but there is one case in template 
>>>>>> interpreter (java_lang_math_abs) which now uses 
>>>>>> StubRoutines::x86::double_sign_mask(). It forces its 
>>>>>> initialization to be moved to StubRoutines::initialize1().
>>>>>>
>>>>>> x87 instructions are made available only on x86-32.
>>>>>>
>>>>>> C1 changes involve removing FPU support on x86-64 and effectively 
>>>>>> make x86-specific support in linear scan allocator [2] x86-32-only.
>>>>>>
>>>>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
>>>>>>
>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
>>>>>>
>>>>>> [3] 
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp 
>>>>>>

From vladimir.kozlov at oracle.com  Thu Dec 19 22:00:02 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 19 Dec 2019 14:00:02 -0800
Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64
In-Reply-To: <0b0897e0-1dbc-306d-b2cb-31de13fb8b34@oracle.com>
References: <0b0897e0-1dbc-306d-b2cb-31de13fb8b34@oracle.com>
Message-ID: <7063EB29-D415-4A48-BA4F-B16C5A1F52F8@oracle.com>

Good 

Thanks
Vladimir

> On Dec 19, 2019, at 1:58 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> ?
>>>> Make ConversionStub x86_32-specific only if possible. From what I see it is only LIR_OpConvert in c1_LIR.hpp we have to deal with. I actually can't see how it could be only 32-specific. Hmm?
>>> 
>>> I experimented with it, but it requires #ifdefs in c1_LIR.cpp/hpp which I don't like. So, I don't consider it as an  > option right now.
>> Okay, NOT_IA32( ShouldNotReachHere() ) with comment in c1_CodeStubs.hpp should be enough for now.
> 
> Incremental diff:
>  http://cr.openjdk.java.net/~vlivanov/7175279/webrev.01-00/
> 
> Best regards,
> Vladimir Ivanov
> 
>>>>>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we use AMD64:
>>>>>> 
>>>>>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 
>>>>> 
>>>>> 
>>>>> 
>>>>> Yes, you are right. Good catch! :-)
>>>>> 
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>> 
>>>>>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote:
>>>>>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-7175279
>>>>>>> 
>>>>>>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base.
>>>>>>> 
>>>>>>> Proposed patch removes the rest and makes x86-64 code x87-free.
>>>>>>> 
>>>>>>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2].
>>>>>>> 
>>>>>>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1().
>>>>>>> 
>>>>>>> x87 instructions are made available only on x86-32.
>>>>>>> 
>>>>>>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only.
>>>>>>> 
>>>>>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>> 
>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916
>>>>>>> 
>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414
>>>>>>> 
>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp 


From igor.veresov at oracle.com  Thu Dec 19 23:12:06 2019
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 19 Dec 2019 15:12:06 -0800
Subject: [14] RFR(M) 8235927: Update Graal
In-Reply-To: <285e0384-3c94-c0fd-466f-2a0faaa1f3b0@oracle.com>
References: <D3125C9B-37D0-45AC-AEE6-5871748EF533@oracle.com>
 <285e0384-3c94-c0fd-466f-2a0faaa1f3b0@oracle.com>
Message-ID: <6AC6A96E-4055-4629-862D-256152EB7E6C@oracle.com>

Thanks, Vladimir!

igor


> On Dec 19, 2019, at 12:54 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good. There seem no new testing failures, only old.
> 
> Thanks,
> Vladimir
> 
> On 12/19/19 11:59 AM, Igor Veresov wrote:
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8235927 <https://bugs.openjdk.java.net/browse/JDK-8235927>
>> Webrev: http://cr.openjdk.java.net/~iveresov/8235927/webrev/ <http://cr.openjdk.java.net/~iveresov/8235927/webrev/>
>> Please find the list of changes in the JBS issue.
>> Thanks,
>> igor


From sandhya.viswanathan at intel.com  Fri Dec 20 00:39:40 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 20 Dec 2019 00:39:40 +0000
Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly
 assigned upper bank xmm registers after Generic Operands (JDK-8234391)
Message-ID: <BYAPR11MB2983E2D5E975EB35FE8126B9EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>

TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391<https://bugs.openjdk.java.net/browse/JDK-8234391>)

With Generic Operands (JDK-8234391), TEMP is now specialized to a vector register of max_vector_size(). There is a correctness issue with this.
Say if we have an ad file instruct which had TEMP as vecX or vecY and it was required that the vector register be limited to xmm0-15 for KNL but not for SKX.
Now we replaced that TEMP by max_vector_size(), the Matcher::specialize_generic_vector_operand() sets the temp to vecZ which has the range xmm0-31.

As a background: vecX/vecY is the entire range (xmm0-xmm31) for SKX and (xmm0-xmm15) for KNL.
                                vecZ is entire range (xmm0-xmm31) for both SKX and KNL.

Such cases arise for add and mul reduction rules and the long replicates in x86.ad.

The fix proposed for JDK 14 is to specialize the TEMP to legVecZ instead for KNL, thereby limiting it to xmm0-15.

JBS: https://bugs.openjdk.java.net/browse/JDK-8236364
Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.00/

Best Regards,
Sandhya


From vladimir.kozlov at oracle.com  Fri Dec 20 01:17:13 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 19 Dec 2019 17:17:13 -0800
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
 <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>
Message-ID: <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com>

We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers:

StubRoutines::_bigIntegerRightShiftWorker
StubRoutines::_bigIntegerLeftShiftWorker

In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch).

Thanks,
Vladimir

[1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/

On 12/18/19 6:33 PM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> I have made the code changes you suggested (please look at the email below).
> I have also enabled the intrinsic to run only when VBMI2 feature is available.
> The intrinsic shows gains of >1.5x above 4k bit BigInteger.
> 
> Webrev link: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/
> 
> Thanks,
> Smita
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Wednesday, December 11, 2019 10:55 AM
> To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64
> 
> Hi Kamath,
> 
> First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2.
> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.
> 
> Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?
> 
> Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing.
> 
> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert.
> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.
> Smita >>>done
> 
> I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset.
> Smita >> removed UseVBMI2 flag
> 
> In vm_version_x86.cpp you need to add more %s in print statement for new output.
> Smita  >>> done
> 
> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.
> Smita >>> done
> 
> You need to add intrinsics to Graal's test to ignore them:
> 
> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416
> Smita >>>done
> 
> Thanks,
> Vladimir
> 
> On 12/10/19 5:41 PM, Kamath, Smita wrote:
>> Hi,
>>
>>
>> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.
>>
>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
>>
>> Link to webrev :
>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
>>
>>
>>
>> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
>>
>>
>> [1]
>> https://software.intel.com/sites/default/files/managed/39/c5/325462-sd
>> m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol.
>> 2C 5-471)
>>
>> [2]
>> https://software.intel.com/en-us/articles/intel-software-development-e
>> mulator
>>
>>
>> Regards,
>>
>> Smita Kamath
>>

From xxinliu at amazon.com  Fri Dec 20 03:47:38 2019
From: xxinliu at amazon.com (Liu, Xin)
Date: Fri, 20 Dec 2019 03:47:38 +0000
Subject: RFR(XXS): clean up BarrierSet headers in c1_LIRAssembler 
Message-ID: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com>

Hi, Reviewers,

Could you take a look at my webrev? I feel that those barrisetSet interfaces have nothing with c1_LIRAssembler.

Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/

I try to build on aarch64 and x86_64 and it?s fine.

Thanks,
--lx


From goetz.lindenmaier at sap.com  Fri Dec 20 10:12:53 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 20 Dec 2019 10:12:53 +0000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
 <AM6PR02MB5347FCBF21E9DA1B1542CDABEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <a1a9e366-e019-8872-6f16-dfe6b99fe648@oracle.com>
 <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com>
Message-ID: <AM6PR02MB534758C9E7CBDCCA6A1193F1EC2D0@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi Vladimir, 

I refactored it a bit, see new webrev:
http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/02/

Also, I filed 
8236414: stringStream allocates on ResourceArea and C-heap
https://bugs.openjdk.java.net/browse/JDK-8236414
... but I'm not sure how to solve it, as stringStream inherits
this capability, and having the char* on the ResourceArea
as before is not good, either.
Anyways, there are very few places where new is used
with the stringStream, and the PrintInlining implementation 
is the only one where it's problematic. Maybe it would be
better to simplify PrintInlining.

Best regards,
  Goetz.

> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Donnerstag, 19. Dezember 2019 19:07
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-compiler-
> dev at openjdk.java.net
> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> dev at openjdk.java.net>
> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> '8224193: stringStream should not use Resouce Area'.
> 
> Please, file RFE for refactoring stringStream.
> 
> Yes, the fix can go into JDK 14.
> 
> But before that, I see the same pattern used 3 times in compile.cpp:
> 
> +    if (_print_inlining_stream != NULL) _print_inlining_stream-
> >~stringStream();
>       _print_inlining_stream = <something>;
> 
> Can you use one function for that? Also our coding style requires to put body of
> 'if' on separate line and use {}.
> 
> thanks,
> Vladimir
> 
> On 12/19/19 4:52 AM, David Holmes wrote:
> > On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote:
> >> Hi,
> >>
> >> yes, it is confusing that parts are on the arena, other parts
> >> are allocated in the C-heap.
> >> But usages which allocate the stringStream with new() are
> >> rare, usually it's allocated on the stack making all this
> >> more simple.? And the previous design was even more
> >> error-prone.
> >> Also, the whole way to print the inlining information
> >> is quite complex, with strange usage of the copy constructor
> >> of PrintInliningBuffer ... which reaches into GrowableArray
> >> which should have a constructor that does not use the
> >> copy constructor to initialize the elements ...
> >>
> >> I do not intend to change stringStream in this change.
> >> So can I consider this reviewed from your side? Or at
> >> least that there is no veto :)?
> >
> > Sorry I was trying to convey this is Reviewed, but I do think this needs further
> work in the future.
> >
> > Thanks,
> > David
> >
> >> Thanks and best regards,
> >> ?? Goetz.
> >>
> >>
> >>> -----Original Message-----
> >>> From: David Holmes <david.holmes at oracle.com>
> >>> Sent: Donnerstag, 19. Dezember 2019 11:39
> >>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vladimir Kozlov
> >>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> >>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> >>> dev at openjdk.java.net>
> >>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> >>> '8224193: stringStream should not use Resouce Area'.
> >>>
> >>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
> >>>> Hi David, Vladimir,
> >>>>
> >>>> stringStream is a ResourceObj, thus it lives on an arena.
> >>>> This is uncritical, as it does not resize.
> >>>> 8224193 only changed the allocation of the internal char*,
> >>>> which always caused problems with resizing under
> >>>> ResourceMarks that were not placed for the string but to
> >>>> free other memory.
> >>>> Thus stringStream must not be deallocated, and
> >>>> also there was no mem leak before that change.
> >>>> But we need to call the destructor to free the char*.
> >>>
> >>> I think we have a confusing mix of arena and C_heap usage with
> >>> stringStream. Not clear to me why stringStream remains a resourceObj
> >>> now? In many cases the stringStream is just local on the stack. In other
> >>> cases if it is new'd then it should be C-heap same as the array and then
> >>> you could delete it too.
> >>>
> >>> What you have may suffice to initially address the leak but I think this
> >>> whole thing needs revisiting.
> >>>
> >>> Thanks,
> >>> David
> >>>
> >>>>
> >>>> Best regards,
> >>>> ??? Goetz.
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
> >>> bounces at openjdk.java.net>
> >>>>> On Behalf Of David Holmes
> >>>>> Sent: Mittwoch, 18. Dezember 2019 04:33
> >>>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
> >>>>> dev at openjdk.java.net
> >>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> >>>>> dev at openjdk.java.net>
> >>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> >>>>> '8224193: stringStream should not use Resouce Area'.
> >>>>>
> >>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
> >>>>>> CCing to Runtime group.
> >>>>>>
> >>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not
> obvious.
> >>>>>> I would definitively miss to do that if I use stringStreams in some new
> >>>>>> code.
> >>>>>
> >>>>> But that is not a problem added by this changeset, the problem is that
> >>>>> we're not deallocating these stringStreams even though we should be.? If
> >>>>> you use a stringStream in new code you have to manage its lifecycle.
> >>>>>
> >>>>> That said why is this:
> >>>>>
> >>>>> ??? if (_print_inlining_stream != NULL)
> >>>>> _print_inlining_stream->~stringStream();
> >>>>>
> >>>>> not just:
> >>>>>
> >>>>> delete _print_inlining_stream;
> >>>>>
> >>>>> ?
> >>>>>
> >>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we
> explicitly
> >>>>> calling the destructor rather than calling delete?
> >>>>>
> >>>>> Cheers,
> >>>>> David
> >>>>>
> >>>>>> May be someone can suggest some C++ trick to do that automatically.
> >>>>>> Thanks,
> >>>>>> Vladimir
> >>>>>>
> >>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I'm resending this with fixed bugId ...
> >>>>>>> Sorry!
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> ? ?? Goetz
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with
> new
> >>> and
> >>>>>>>> relied on the fact that all memory used is on the ResourceArea
> cleaned
> >>>>>>>> after the compilation.
> >>>>>>>>
> >>>>>>>> Since 8224193 the char* of the stringStream is malloced and thus
> >>>>>>>> must be freed. No doing so manifests a memory leak.
> >>>>>>>> This is only relevant if the corresponding tracing is active.
> >>>>>>>>
> >>>>>>>> To fix TraceLoopPredicate I added the destructor call
> >>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several
> >>>>>>>> stringStreams. A row of them is in a GrowableArray which must
> >>>>>>>> be walked to free all of them.
> >>>>>>>> As the GrowableArray is on an arena no destructor is called for it.
> >>>>>>>>
> >>>>>>>> I also changed some as_string() calls to base() calls which reduced
> >>>>>>>> memory need of the traces, and added a comment explaining the
> >>>>>>>> constructor of GrowableArray that calls the copyconstructor for its
> >>>>>>>> elements.
> >>>>>>>>
> >>>>>>>> Please review:
> >>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
> >>>>> c2_tracing_mem_leak/01/
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> ? ?? Goetz.
> >>>>>>>

From vladimir.x.ivanov at oracle.com  Fri Dec 20 10:19:41 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 20 Dec 2019 13:19:41 +0300
Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly
 assigned upper bank xmm registers after Generic Operands (JDK-8234391)
In-Reply-To: <BYAPR11MB2983E2D5E975EB35FE8126B9EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <BYAPR11MB2983E2D5E975EB35FE8126B9EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com>

Hi Sandhya,

> Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.00/

I'd prefer to see the check as a special case with a comment:

  MachOper* Matcher::specialize_generic_vector_operand(MachOper* 
generic_opnd, uint ideal_reg) {
    assert(Matcher::is_generic_vector(generic_opnd), "not generic");
    bool legacy = (generic_opnd->opcode() == LEGVEC);
+  if (!VM_Version::supports_avx512vlbwdq() && // KNL
+      is_temp && !legacy && (ideal_reg == Op_VecZ)) {
+    // Conservatively specialize 512bit vec TEMP operands to legVecZ 
(zmm0-15) on KNL.
+    return new legVecZOper();
+  }
    if (legacy) {
      switch (ideal_reg) {
        case Op_VecS: return new legVecSOper();

Otherwise, looks good.

I consider it as a stop-the-gap solution for 14. In 15 we need to get 
rid of it and adjust TEMP operand types in x86.ad instead. Please, file 
an RFE for it.

Best regards,
Vladimir Ivanov

> 
> Best Regards,
> Sandhya
> 
> 
> 

From Pengfei.Li at arm.com  Fri Dec 20 10:21:43 2019
From: Pengfei.Li at arm.com (Pengfei Li)
Date: Fri, 20 Dec 2019 10:21:43 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
 <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
Message-ID: <DB7PR08MB3115DF046D6974575DDCF8EB962D0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi,

I'm back for this patch.

> That is starting to sound very attractive. With a 64-bit address space I'm
> finding it very hard to imagine a scenario in which we don't find a suitable
> address. I think AOT-compiled code would still be OK, because it generates
> different code, but we'd have to do some testing.

Since Nick's recent metaspace reservation fix [1] has completely removed the use of r27, my patch becomes much simpler now. I have removed the condition of UseCompressedClassPointers, rebased the code and created a new webrev. Could you please help review again?

Webrev: http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.02/

[1] http://hg.openjdk.java.net/jdk/jdk/rev/dd4b4f273274

--
Thanks,
Pengfei


From martin.doerr at sap.com  Fri Dec 20 10:58:13 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 20 Dec 2019 10:58:13 +0000
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <BL0PR2101MB108975BCC2B3097442E5071FAC520@BL0PR2101MB1089.namprd21.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <BL0PR2101MB108975BCC2B3097442E5071FAC520@BL0PR2101MB1089.namprd21.prod.outlook.com>
Message-ID: <AM0PR0202MB3297233B60F583D7A8DC66D49A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi,

builds were successful on all the platforms I have added. A lot of tests were running over night and I haven't seen any new issues.

Can I push this version?
http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/

Best regards,
Martin


> -----Original Message-----
> From: Aditya Mandaleeka <adityam at microsoft.com>
> Sent: Donnerstag, 19. Dezember 2019 18:49
> To: Doerr, Martin <martin.doerr at sap.com>; Roland Westrelin
> <rwestrel at redhat.com>; hotspot compiler <hotspot-compiler-
> dev at openjdk.java.net>
> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS
> 
> Thanks for updating the other platforms Martin. Those changes look right to
> me.
> 
> -Aditya
> 
> -----Original Message-----
> From: Doerr, Martin <martin.doerr at sap.com>
> Sent: Thursday, December 19, 2019 8:31 AM
> To: Roland Westrelin <rwestrel at redhat.com>; Aditya Mandaleeka
> <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-
> dev at openjdk.java.net>
> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS
> 
> Hi everybody,
> 
> thanks for fixing this issue.
> 
> I guess it's currently used on some platforms, but I think we should fix it for
> all platforms. Otherwise it will break when using the parts which were only
> fixed for x86.
> 
> Here's my proposal:
> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openj
> dk.java.net%2F~mdoerr%2F8236179_C1_T_ADDRESS%2Fwebrev.01%2F&am
> p;data=02%7C01%7Cadityam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d
> 784a0dcc1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637123698
> 748787948&amp;sdata=r11YVMnHSLm1Ms1Ipbq4vPDOhIwlrM8fz1QlAl%2BU
> WGY%3D&amp;reserved=0
> 
> I'll run tests on more platforms.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: hotspot-compiler-dev <hotspot-compiler-dev-
> > bounces at openjdk.java.net> On Behalf Of Roland Westrelin
> > Sent: Donnerstag, 19. Dezember 2019 15:15
> > To: Aditya Mandaleeka <adityam at microsoft.com>; hotspot compiler
> > <hotspot-compiler-dev at openjdk.java.net>
> > Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
> > Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS
> >
> >
> > Hi Aditya,
> >
> > AFAIK, it's a requirement that the patch be posted on the openjdk
> > infrastructure. So here it is:
> >
> > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open
> >
> jdk.java.net%2F~roland%2F8236179%2Fwebrev.00%2F&amp;data=02%7C01
> %7Cadi
> >
> tyam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d784a0dcc1%7C72f988bf
> 86f1
> >
> 41af91ab2d7cd011db47%7C1%7C0%7C637123698748787948&amp;sdata=vQ1
> xR87EjA
> > bf%2Bnwscs1c%2BpTqWLfeVODLz%2FleIsdmthU%3D&amp;reserved=0
> >
> > The change looks good to me but it would be good to check whether
> > architectures other than x86 need a similar change.
> >
> > Roland.


From aph at redhat.com  Fri Dec 20 11:09:13 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 20 Dec 2019 12:09:13 +0100
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB3115DF046D6974575DDCF8EB962D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
 <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
 <DB7PR08MB3115DF046D6974575DDCF8EB962D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <379d9c30-a139-4544-9339-a6ddec913b78@redhat.com>

On 12/20/19 10:21 AM, Pengfei Li wrote:
> Since Nick's recent metaspace reservation fix [1] has completely removed the use of r27, my patch becomes much simpler now. I have removed the condition of UseCompressedClassPointers, rebased the code and created a new webrev. Could you please help review again?

What happens when we use Graal as a replacement for C2, particularly
when Graal needs a heap base register?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From rkennke at redhat.com  Fri Dec 20 11:10:59 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 20 Dec 2019 12:10:59 +0100
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <AM0PR0202MB3297233B60F583D7A8DC66D49A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <BL0PR2101MB108975BCC2B3097442E5071FAC520@BL0PR2101MB1089.namprd21.prod.outlook.com>
 <AM0PR0202MB3297233B60F583D7A8DC66D49A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <509cc49c-4e68-f2a0-6d11-4ece4ddfdd9e@redhat.com>

Fine by me.

Thanks,
Roman


> Hi,
> 
> builds were successful on all the platforms I have added. A lot of tests were running over night and I haven't seen any new issues.
> 
> Can I push this version?
> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Aditya Mandaleeka <adityam at microsoft.com>
>> Sent: Donnerstag, 19. Dezember 2019 18:49
>> To: Doerr, Martin <martin.doerr at sap.com>; Roland Westrelin
>> <rwestrel at redhat.com>; hotspot compiler <hotspot-compiler-
>> dev at openjdk.java.net>
>> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
>> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS
>>
>> Thanks for updating the other platforms Martin. Those changes look right to
>> me.
>>
>> -Aditya
>>
>> -----Original Message-----
>> From: Doerr, Martin <martin.doerr at sap.com>
>> Sent: Thursday, December 19, 2019 8:31 AM
>> To: Roland Westrelin <rwestrel at redhat.com>; Aditya Mandaleeka
>> <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-
>> dev at openjdk.java.net>
>> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
>> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS
>>
>> Hi everybody,
>>
>> thanks for fixing this issue.
>>
>> I guess it's currently used on some platforms, but I think we should fix it for
>> all platforms. Otherwise it will break when using the parts which were only
>> fixed for x86.
>>
>> Here's my proposal:
>> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openj
>> dk.java.net%2F~mdoerr%2F8236179_C1_T_ADDRESS%2Fwebrev.01%2F&am
>> p;data=02%7C01%7Cadityam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d
>> 784a0dcc1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637123698
>> 748787948&amp;sdata=r11YVMnHSLm1Ms1Ipbq4vPDOhIwlrM8fz1QlAl%2BU
>> WGY%3D&amp;reserved=0
>>
>> I'll run tests on more platforms.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>>> bounces at openjdk.java.net> On Behalf Of Roland Westrelin
>>> Sent: Donnerstag, 19. Dezember 2019 15:15
>>> To: Aditya Mandaleeka <adityam at microsoft.com>; hotspot compiler
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
>>> Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS
>>>
>>>
>>> Hi Aditya,
>>>
>>> AFAIK, it's a requirement that the patch be posted on the openjdk
>>> infrastructure. So here it is:
>>>
>>> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open
>>>
>> jdk.java.net%2F~roland%2F8236179%2Fwebrev.00%2F&amp;data=02%7C01
>> %7Cadi
>>>
>> tyam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d784a0dcc1%7C72f988bf
>> 86f1
>>>
>> 41af91ab2d7cd011db47%7C1%7C0%7C637123698748787948&amp;sdata=vQ1
>> xR87EjA
>>> bf%2Bnwscs1c%2BpTqWLfeVODLz%2FleIsdmthU%3D&amp;reserved=0
>>>
>>> The change looks good to me but it would be good to check whether
>>> architectures other than x86 need a similar change.
>>>
>>> Roland.
> 


From tobias.hartmann at oracle.com  Fri Dec 20 11:13:15 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 20 Dec 2019 12:13:15 +0100
Subject: [14] RFR(S): 8233164: C2 fails with
 assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr))
 failed: correct memory chain
Message-ID: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8233164
http://cr.openjdk.java.net/~thartmann/8233164/webrev.00/

The problem is that the arraycopy ideal transformation does not correctly wire memory inputs on
individual loads from a non-escaping src array.

I was able to extract a test from the intermittently failing application that depends on indify
string concat (test1). From that, I've created another test that does not depend on Strings (test2).

Gory details based on test2:
Two subsequent arraycopies copy data from two sources into the same destination. The ideal
transformation replaces the first arraycopy by a forward and backward copy (because
ArrayCopyNode::array_copy_test_overlap determines that the arrays may overlap). A PhiNode is added
to select between the memory outputs of the forward and backward stores:
http://hg.openjdk.java.net/jdk/jdk/file/2fbc66ef1a1d/src/hotspot/share/opto/arraycopynode.cpp#l617

 574	StoreB	===  559  572  553  573  [[ 568  576 ]]  @byte[int:0..max-2]:NotNull:exact+any ...
 567	StoreB	===  558  562  565  566  [[ 560  576 ]]  @byte[int:0..max-2]:NotNull:exact+any ...
 575	Region	===  575  558  559  [[ 575  576 ]]
 576	Phi	===  575  567  574  [[]]  #memory  Memory: @byte[int:>=0]:exact+any *, idx=6;

The second arraycopy can't be optimized (yet) because the src operand and its length are not known
but once Escape Analysis is executed, it determines that both source arrays are non-escaping. Now
the ideal transformation is able to replace the second arraycopy by loads/stores as well.

The memory slice for the first load is selected from the MergeMem based on its address type
('atp_src') which is the general byte[int:>=0] slice due to a CastPP that hides the type of the
non-escaping source array. As a result, we wire it to the byte[int:>=0] memory Phi that was created
when optimizing the first arraycopy:

 305	CheckCastPP ===  302  300  [[ 385  316  420  494 ]]  #byte[int:1]:NotNull:exact *,iid=288
 494	CastPP	===  486  305  [[ 526  514  626  626 ]]  #byte[int:>=0]:NotNull:exact *
 626	AddP	=== _  494  494  68  [[ 629 ]]
 576	Phi	===  575  567  574  [[ 560  628  629 ]]  #memory  Memory: @byte[int:>=0]:exact+any
 629	LoadB	===  508  576  626  [[]]  @byte[int:>=0]:NotNull:exact+any *, idx=6; #byte

That's obviously incorrect because now the LoadB has an address input into a non-escaping source
with type byte[int:1], iid=288 and a memory input from the independent byte[int:>=0] slice:
http://cr.openjdk.java.net/~thartmann/8233164/graph_Failure.png

Basically, 629 LoadB uses memory from 47 AllocateArray when loading from an address into
non-escaping 288 AllocateArray. We then hit an assert in MemNode::optimize_memory_chain.

The fix is to use _src_type/_dest_type (introduced by JDK-8076188) as address types for the loads
and stores. These will have the correct type if the source or destination array is non-escaping.

The affected code is in there for a long time but I can only reproduce this back until JDK 13 b08
when JDK-8217990 was fixed. I don't think that fix is related but since the crash greatly depends on
the order in which nodes are processed by IGVN, it probably just triggers the issue (for example, if
we process a CastPP node first, it goes away and we get the correct type from the CheckCastPP). The
code was also changed significantly by JDK-8210887 in JDK 12.

Thanks,
Tobias

From vladimir.x.ivanov at oracle.com  Fri Dec 20 11:25:30 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 20 Dec 2019 14:25:30 +0300
Subject: [14] RFR(S): 8233164: C2 fails with
 assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr))
 failed: correct memory chain
In-Reply-To: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com>
References: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com>
Message-ID: <9f1a459c-2f46-eb5d-ff3d-0258e6bd4888@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8233164/webrev.00/

Looks good.

Best regards,
Vladimir Ivanov

> The problem is that the arraycopy ideal transformation does not correctly wire memory inputs on
> individual loads from a non-escaping src array.
> 
> I was able to extract a test from the intermittently failing application that depends on indify
> string concat (test1). From that, I've created another test that does not depend on Strings (test2).
> 
> Gory details based on test2:
> Two subsequent arraycopies copy data from two sources into the same destination. The ideal
> transformation replaces the first arraycopy by a forward and backward copy (because
> ArrayCopyNode::array_copy_test_overlap determines that the arrays may overlap). A PhiNode is added
> to select between the memory outputs of the forward and backward stores:
> http://hg.openjdk.java.net/jdk/jdk/file/2fbc66ef1a1d/src/hotspot/share/opto/arraycopynode.cpp#l617
> 
>   574	StoreB	===  559  572  553  573  [[ 568  576 ]]  @byte[int:0..max-2]:NotNull:exact+any ...
>   567	StoreB	===  558  562  565  566  [[ 560  576 ]]  @byte[int:0..max-2]:NotNull:exact+any ...
>   575	Region	===  575  558  559  [[ 575  576 ]]
>   576	Phi	===  575  567  574  [[]]  #memory  Memory: @byte[int:>=0]:exact+any *, idx=6;
> 
> The second arraycopy can't be optimized (yet) because the src operand and its length are not known
> but once Escape Analysis is executed, it determines that both source arrays are non-escaping. Now
> the ideal transformation is able to replace the second arraycopy by loads/stores as well.
> 
> The memory slice for the first load is selected from the MergeMem based on its address type
> ('atp_src') which is the general byte[int:>=0] slice due to a CastPP that hides the type of the
> non-escaping source array. As a result, we wire it to the byte[int:>=0] memory Phi that was created
> when optimizing the first arraycopy:
> 
>   305	CheckCastPP ===  302  300  [[ 385  316  420  494 ]]  #byte[int:1]:NotNull:exact *,iid=288
>   494	CastPP	===  486  305  [[ 526  514  626  626 ]]  #byte[int:>=0]:NotNull:exact *
>   626	AddP	=== _  494  494  68  [[ 629 ]]
>   576	Phi	===  575  567  574  [[ 560  628  629 ]]  #memory  Memory: @byte[int:>=0]:exact+any
>   629	LoadB	===  508  576  626  [[]]  @byte[int:>=0]:NotNull:exact+any *, idx=6; #byte
> 
> That's obviously incorrect because now the LoadB has an address input into a non-escaping source
> with type byte[int:1], iid=288 and a memory input from the independent byte[int:>=0] slice:
> http://cr.openjdk.java.net/~thartmann/8233164/graph_Failure.png
> 
> Basically, 629 LoadB uses memory from 47 AllocateArray when loading from an address into
> non-escaping 288 AllocateArray. We then hit an assert in MemNode::optimize_memory_chain.
> 
> The fix is to use _src_type/_dest_type (introduced by JDK-8076188) as address types for the loads
> and stores. These will have the correct type if the source or destination array is non-escaping.
> 
> The affected code is in there for a long time but I can only reproduce this back until JDK 13 b08
> when JDK-8217990 was fixed. I don't think that fix is related but since the crash greatly depends on
> the order in which nodes are processed by IGVN, it probably just triggers the issue (for example, if
> we process a CastPP node first, it goes away and we get the correct type from the CheckCastPP). The
> code was also changed significantly by JDK-8210887 in JDK 12.
> 
> Thanks,
> Tobias
> 

From tobias.hartmann at oracle.com  Fri Dec 20 11:34:50 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 20 Dec 2019 12:34:50 +0100
Subject: [14] RFR(S): 8233164: C2 fails with
 assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr))
 failed: correct memory chain
In-Reply-To: <9f1a459c-2f46-eb5d-ff3d-0258e6bd4888@oracle.com>
References: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com>
 <9f1a459c-2f46-eb5d-ff3d-0258e6bd4888@oracle.com>
Message-ID: <e4fa576d-e9f8-e481-d1c8-04a53ccfa79b@oracle.com>

Hi Vladimir,

thanks for the quick review!

Best regards,
Tobias

On 20.12.19 12:25, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8233164/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>> The problem is that the arraycopy ideal transformation does not correctly wire memory inputs on
>> individual loads from a non-escaping src array.
>>
>> I was able to extract a test from the intermittently failing application that depends on indify
>> string concat (test1). From that, I've created another test that does not depend on Strings (test2).
>>
>> Gory details based on test2:
>> Two subsequent arraycopies copy data from two sources into the same destination. The ideal
>> transformation replaces the first arraycopy by a forward and backward copy (because
>> ArrayCopyNode::array_copy_test_overlap determines that the arrays may overlap). A PhiNode is added
>> to select between the memory outputs of the forward and backward stores:
>> http://hg.openjdk.java.net/jdk/jdk/file/2fbc66ef1a1d/src/hotspot/share/opto/arraycopynode.cpp#l617
>>
>> ? 574??? StoreB??? ===? 559? 572? 553? 573? [[ 568? 576 ]]? @byte[int:0..max-2]:NotNull:exact+any ...
>> ? 567??? StoreB??? ===? 558? 562? 565? 566? [[ 560? 576 ]]? @byte[int:0..max-2]:NotNull:exact+any ...
>> ? 575??? Region??? ===? 575? 558? 559? [[ 575? 576 ]]
>> ? 576??? Phi??? ===? 575? 567? 574? [[]]? #memory? Memory: @byte[int:>=0]:exact+any *, idx=6;
>>
>> The second arraycopy can't be optimized (yet) because the src operand and its length are not known
>> but once Escape Analysis is executed, it determines that both source arrays are non-escaping. Now
>> the ideal transformation is able to replace the second arraycopy by loads/stores as well.
>>
>> The memory slice for the first load is selected from the MergeMem based on its address type
>> ('atp_src') which is the general byte[int:>=0] slice due to a CastPP that hides the type of the
>> non-escaping source array. As a result, we wire it to the byte[int:>=0] memory Phi that was created
>> when optimizing the first arraycopy:
>>
>> ? 305??? CheckCastPP ===? 302? 300? [[ 385? 316? 420? 494 ]]? #byte[int:1]:NotNull:exact *,iid=288
>> ? 494??? CastPP??? ===? 486? 305? [[ 526? 514? 626? 626 ]]? #byte[int:>=0]:NotNull:exact *
>> ? 626??? AddP??? === _? 494? 494? 68? [[ 629 ]]
>> ? 576??? Phi??? ===? 575? 567? 574? [[ 560? 628? 629 ]]? #memory? Memory: @byte[int:>=0]:exact+any
>> ? 629??? LoadB??? ===? 508? 576? 626? [[]]? @byte[int:>=0]:NotNull:exact+any *, idx=6; #byte
>>
>> That's obviously incorrect because now the LoadB has an address input into a non-escaping source
>> with type byte[int:1], iid=288 and a memory input from the independent byte[int:>=0] slice:
>> http://cr.openjdk.java.net/~thartmann/8233164/graph_Failure.png
>>
>> Basically, 629 LoadB uses memory from 47 AllocateArray when loading from an address into
>> non-escaping 288 AllocateArray. We then hit an assert in MemNode::optimize_memory_chain.
>>
>> The fix is to use _src_type/_dest_type (introduced by JDK-8076188) as address types for the loads
>> and stores. These will have the correct type if the source or destination array is non-escaping.
>>
>> The affected code is in there for a long time but I can only reproduce this back until JDK 13 b08
>> when JDK-8217990 was fixed. I don't think that fix is related but since the crash greatly depends on
>> the order in which nodes are processed by IGVN, it probably just triggers the issue (for example, if
>> we process a CastPP node first, it goes away and we get the correct type from the CheckCastPP). The
>> code was also changed significantly by JDK-8210887 in JDK 12.
>>
>> Thanks,
>> Tobias
>>

From rwestrel at redhat.com  Fri Dec 20 14:01:20 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 20 Dec 2019 15:01:20 +0100
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <87bls3ksen.fsf@redhat.com>


Hi Martin,

> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/

c1_LIRAssembler_aarch64.cpp and c1_LIRAssembler_s390.cpp: shouldn't
stack2reg be fixed too?

Roland.


From martin.doerr at sap.com  Fri Dec 20 14:47:59 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 20 Dec 2019 14:47:59 +0000
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <87bls3ksen.fsf@redhat.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <87bls3ksen.fsf@redhat.com>
Message-ID: <AM0PR0202MB3297533FFED6DCCC7A8482189A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi Roland,

good catch. I'll push this version if there are no objections:
http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/

Best regards,
Martin


> -----Original Message-----
> From: Roland Westrelin <rwestrel at redhat.com>
> Sent: Freitag, 20. Dezember 2019 15:01
> To: Doerr, Martin <martin.doerr at sap.com>; Aditya Mandaleeka
> <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-
> dev at openjdk.java.net>
> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS
> 
> 
> Hi Martin,
> 
> > http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/
> 
> c1_LIRAssembler_aarch64.cpp and c1_LIRAssembler_s390.cpp: shouldn't
> stack2reg be fixed too?
> 
> Roland.


From martin.doerr at sap.com  Fri Dec 20 15:03:44 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 20 Dec 2019 15:03:44 +0000
Subject: RFR(XXS): clean up BarrierSet headers in c1_LIRAssembler 
In-Reply-To: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com>
References: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com>
Message-ID: <AM0PR0202MB3297939D356AB3F16E9332A29A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi lx,

PPC and s390 parts are ok.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Liu, Xin
> Sent: Freitag, 20. Dezember 2019 04:48
> To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> dev at openjdk.java.net>
> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in
> c1_LIRAssembler
> 
> Hi, Reviewers,
> 
> Could you take a look at my webrev? I feel that those barrisetSet interfaces
> have nothing with c1_LIRAssembler.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/
> 
> I try to build on aarch64 and x86_64 and it?s fine.
> 
> Thanks,
> --lx


From rwestrel at redhat.com  Fri Dec 20 15:53:15 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 20 Dec 2019 16:53:15 +0100
Subject: [14] RFR(S): 8233164: C2 fails with
 assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr))
 failed: correct memory chain
In-Reply-To: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com>
References: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com>
Message-ID: <878sn7kn84.fsf@redhat.com>


> http://cr.openjdk.java.net/~thartmann/8233164/webrev.00/

I'm wondering whether this problem could show up elsewhere and if a more
generic fix would be needed (maybe EA should set the type of the CastPP
so there's no inconsistency when IGVN runs). The fix looks good to
address this particular problem but I think this deserves a follow up
bug to investigate this further.

Roland.


From rwestrel at redhat.com  Fri Dec 20 15:58:53 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 20 Dec 2019 16:58:53 +0100
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <AM0PR0202MB3297533FFED6DCCC7A8482189A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <87bls3ksen.fsf@redhat.com>
 <AM0PR0202MB3297533FFED6DCCC7A8482189A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <875zibkmyq.fsf@redhat.com>


> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/

That looks good to me.

Roland.


From rwestrel at redhat.com  Fri Dec 20 16:48:24 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 20 Dec 2019 17:48:24 +0100
Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll
 loops
In-Reply-To: <da3150a6-8303-894c-be9c-c90298f7de7d@oracle.com>
References: <87o8w8ptsq.fsf@redhat.com>
 <da3150a6-8303-894c-be9c-c90298f7de7d@oracle.com>
Message-ID: <8736dfkko7.fsf@redhat.com>


Thanks for reviewing this, Vladimir.

> cfgnode.cpp - should we also check for is_top() to set `doit = false` and bailout?

This:

http://cr.openjdk.java.net/~roland/8231291/webrev.02/

> Why you added check in must_be_not_null()? It is used only in library_call.cpp and should not relate to this changes. 
> Did you find some issues?

I hit a crash when doing some CTW testing because an argument to
must_be_not_null() was already known to be not null. I suppose it's
unrelated to that change and something changed in the libraries instead.

Roland.


From rwestrel at redhat.com  Fri Dec 20 16:56:37 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 20 Dec 2019 17:56:37 +0100
Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll
 loops
In-Reply-To: <3dd0917e-f307-5f22-dd68-786c90ec47a5@oracle.com>
References: <87o8w8ptsq.fsf@redhat.com>
 <3dd0917e-f307-5f22-dd68-786c90ec47a5@oracle.com>
Message-ID: <87zhfnj5q2.fsf@redhat.com>


Hi Vladimir,

Thanks for reviewing this.

> As I understand, the intention of the change you propose is to perform 
> complete loop unrolling earlier so EA can benefit from it.

Yes.

> It looks like LoopOptsMaxUnroll is a shortened version of 
> IdealLoopTree::iteration_split/iteration_split_impl().

Yes.

> Have you considered factoring out the common code? Right now, its hard 
> to correlate the checks for LoopOptsMaxUnroll with iteration_split() and 
> there's a risk they'll diverge eventually.

No I haven't. But it seems it's either we duplicate the code as I did or
we add some flag to iteration_split() and make some of the code their
conditional. Wouldn't that affect clarity and for what's overall a
fairly simple change?

> Do you need the following steps from the original version?
>
> ================================================
>    // Look for loop-exit tests with my 50/50 guesses from the Parsing stage.
>    // Replace with a 1-in-10 exit guess.
>    if (!is_root() && is_loop()) {
>      adjust_loop_exit_prob(phase);
>    }
>
>    // Compute loop trip count from profile data
>    compute_profile_trip_cnt(phase);
>
> No use of profiling data since full unrolling is happening anyway?

Right.

> ========================
>    if (!cl->is_valid_counted_loop()) return true; // Ignore various 
> kinds of broken loops

policy_maximally_unroll() has that check.

> ========================
>    // Do nothing special to pre- and post- loops
>    if (cl->is_pre_loop() || cl->is_post_loop()) return true;
>
> I assume there are no pre-/post-loops exist at that point, so these 
> checks are redundant. Turn them into asserts?

I added a check for is_normal_loop() to be safe.

> ========================
>    if (cl->is_normal_loop()) {
>      if (policy_unswitching(phase)) {
>        phase->do_unswitching(this, old_new);
>        return true;
>      }
>      if (policy_maximally_unroll(phase)) {
>        // Here we did some unrolling and peeling.  Eventually we will
>        // completely unroll this loop and it will no longer be a loop.
>        phase->do_maximally_unroll(this, old_new);
>        return true;
>      }
>
> You don't perform loop unswitching at all. So, the order of operations 
> changes. Do you see any problems with that?

I'm not sure what to think about that one. Yes, I suppose it could be
that unswitching helps but if we unswitch we have to perform 2 passes of
loop opts to get to the maximally unrolled loop. Isn't that more
disruptive that we would want?

> ========================
>
>
> src/hotspot/share/opto/compile.cpp
>     // Perform escape analysis
>     if (_do_escape_analysis && ConnectionGraph::has_candidates(this)) {
>       if (has_loops()) {
>         // Cleanup graph (remove dead nodes).
>         TracePhase tp("idealLoop", &timers[_t_idealLoop]);
> -      PhaseIdealLoop::optimize(igvn, LoopOptsNone);
> +      PhaseIdealLoop::optimize(igvn, LoopOptsMaxUnroll);
>         if (major_progress()) print_method(PHASE_PHASEIDEAL_BEFORE_EA, 2);
>         if (failing())  return;
>       }
>       ConnectionGraph::do_analysis(this, &igvn);
>
> Does it make sense to do more elaborate checks before performing early 
> full loop unrolling? Like whether candidates are used inside loop bodies?

We would maximally unroll the loop anyway, only a bit later so I don't
think we need to make this more involved that it needs to be.

new webrev:

http://cr.openjdk.java.net/~roland/8231291/webrev.02/

Roland.


From navy.xliu at gmail.com  Fri Dec 20 17:24:35 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Fri, 20 Dec 2019 09:24:35 -0800
Subject: RFR(XXS): clean up BarrierSet headers in c1_LIRAssembler
In-Reply-To: <AM0PR0202MB3297939D356AB3F16E9332A29A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com>
 <AM0PR0202MB3297939D356AB3F16E9332A29A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <CAHPdc0n_Sm9Th3iC1bD9T90RCdoQbXpkE0qto-7mP4azzr+pEg@mail.gmail.com>

Martin?

Thank you very much. May I know how to validate Sparc? I don't have any
SPARC machine to access.

Thanks,
--lx


On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin <martin.doerr at sap.com> wrote:

> Hi lx,
>
> PPC and s390 parts are ok.
>
> Best regards,
> Martin
>
>
> > -----Original Message-----
> > From: hotspot-compiler-dev <hotspot-compiler-dev-
> > bounces at openjdk.java.net> On Behalf Of Liu, Xin
> > Sent: Freitag, 20. Dezember 2019 04:48
> > To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> > dev at openjdk.java.net>
> > Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in
> > c1_LIRAssembler
> >
> > Hi, Reviewers,
> >
> > Could you take a look at my webrev? I feel that those barrisetSet
> interfaces
> > have nothing with c1_LIRAssembler.
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
> > Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/
> >
> > I try to build on aarch64 and x86_64 and it?s fine.
> >
> > Thanks,
> > --lx
>
>

From martin.doerr at sap.com  Fri Dec 20 18:54:47 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 20 Dec 2019 18:54:47 +0000
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <875zibkmyq.fsf@redhat.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <87bls3ksen.fsf@redhat.com>
 <AM0PR0202MB3297533FFED6DCCC7A8482189A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <875zibkmyq.fsf@redhat.com>
Message-ID: <AM0PR0202MB32976EC81BACF131049410239A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>

Hi Roland,

thanks for reviewing it. Pushed to jdk/jdk.
I guess we'll have to backport it after some testing time.

Best regards,
Martin


> -----Original Message-----
> From: Roland Westrelin <rwestrel at redhat.com>
> Sent: Freitag, 20. Dezember 2019 16:59
> To: Doerr, Martin <martin.doerr at sap.com>; Aditya Mandaleeka
> <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-
> dev at openjdk.java.net>
> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS
> 
> 
> > http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/
> 
> That looks good to me.
> 
> Roland.


From sandhya.viswanathan at intel.com  Fri Dec 20 19:35:13 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 20 Dec 2019 19:35:13 +0000
Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly
 assigned upper bank xmm registers after Generic Operands (JDK-8234391)
In-Reply-To: <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com>
References: <BYAPR11MB2983E2D5E975EB35FE8126B9EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
 <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com>
Message-ID: <BYAPR11MB29830ABB1E1EDAFE9D68D330EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>

Hi Vladimir,

Thanks for the review. Please find the updated webrev below for JDK 14:
JBS: https://bugs.openjdk.java.net/browse/JDK-8236364
Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.01/


RFE filed for JDK 15:
https://bugs.openjdk.java.net/browse/JDK-8236446

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Friday, December 20, 2019 2:20 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391)

Hi Sandhya,

> Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.00/

I'd prefer to see the check as a special case with a comment:

  MachOper* Matcher::specialize_generic_vector_operand(MachOper*
generic_opnd, uint ideal_reg) {
    assert(Matcher::is_generic_vector(generic_opnd), "not generic");
    bool legacy = (generic_opnd->opcode() == LEGVEC);
+  if (!VM_Version::supports_avx512vlbwdq() && // KNL
+      is_temp && !legacy && (ideal_reg == Op_VecZ)) {
+    // Conservatively specialize 512bit vec TEMP operands to legVecZ
(zmm0-15) on KNL.
+    return new legVecZOper();
+  }
    if (legacy) {
      switch (ideal_reg) {
        case Op_VecS: return new legVecSOper();

Otherwise, looks good.

I consider it as a stop-the-gap solution for 14. In 15 we need to get rid of it and adjust TEMP operand types in x86.ad instead. Please, file an RFE for it.

Best regards,
Vladimir Ivanov

> 
> Best Regards,
> Sandhya
> 
> 
> 

From rkennke at redhat.com  Fri Dec 20 19:58:42 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 20 Dec 2019 20:58:42 +0100
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <AM0PR0202MB32976EC81BACF131049410239A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <87bls3ksen.fsf@redhat.com>
 <AM0PR0202MB3297533FFED6DCCC7A8482189A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <875zibkmyq.fsf@redhat.com>
 <AM0PR0202MB32976EC81BACF131049410239A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
Message-ID: <0e22c4e6-74b9-a50b-df44-83c332457f14@redhat.com>

Hi Martin,

> thanks for reviewing it. Pushed to jdk/jdk.

Thanks a lot!

> I guess we'll have to backport it after some testing time.

Yes, we're gonna need it in 11u and 8u.

Thanks and have a nice weekend (and xmas, etc, if you're also taking
time off)!

Cheers,
Roman

> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Roland Westrelin <rwestrel at redhat.com>
>> Sent: Freitag, 20. Dezember 2019 16:59
>> To: Doerr, Martin <martin.doerr at sap.com>; Aditya Mandaleeka
>> <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-
>> dev at openjdk.java.net>
>> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
>> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS
>>
>>
>>> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/
>>
>> That looks good to me.
>>
>> Roland.
> 


From adityam at microsoft.com  Fri Dec 20 20:06:14 2019
From: adityam at microsoft.com (Aditya Mandaleeka)
Date: Fri, 20 Dec 2019 20:06:14 +0000
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <0e22c4e6-74b9-a50b-df44-83c332457f14@redhat.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <87bls3ksen.fsf@redhat.com>
 <AM0PR0202MB3297533FFED6DCCC7A8482189A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <875zibkmyq.fsf@redhat.com>
 <AM0PR0202MB32976EC81BACF131049410239A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <0e22c4e6-74b9-a50b-df44-83c332457f14@redhat.com>
Message-ID: <DM5PR2101MB1095E88DD902B679320EB0DFAC2D0@DM5PR2101MB1095.namprd21.prod.outlook.com>

Thanks again to everyone who helped get this change in! I am happy to help backport it as well if it can wait until January.

Thanks,
Aditya

-----Original Message-----
From: Roman Kennke <rkennke at redhat.com> 
Sent: Friday, December 20, 2019 11:59 AM
To: Doerr, Martin <martin.doerr at sap.com>; Roland Westrelin <rwestrel at redhat.com>; Aditya Mandaleeka <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS

Hi Martin,

> thanks for reviewing it. Pushed to jdk/jdk.

Thanks a lot!

> I guess we'll have to backport it after some testing time.

Yes, we're gonna need it in 11u and 8u.

Thanks and have a nice weekend (and xmas, etc, if you're also taking time off)!

Cheers,
Roman

> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Roland Westrelin <rwestrel at redhat.com>
>> Sent: Freitag, 20. Dezember 2019 16:59
>> To: Doerr, Martin <martin.doerr at sap.com>; Aditya Mandaleeka 
>> <adityam at microsoft.com>; hotspot compiler <hotspot-compiler- 
>> dev at openjdk.java.net>
>> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
>> Subject: RE: RFR: 8236179: C1 register allocation error with 
>> T_ADDRESS
>>
>>
>>> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/
>>
>> That looks good to me.
>>
>> Roland.
> 


From rkennke at redhat.com  Fri Dec 20 20:37:24 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 20 Dec 2019 21:37:24 +0100
Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS
In-Reply-To: <DM5PR2101MB1095E88DD902B679320EB0DFAC2D0@DM5PR2101MB1095.namprd21.prod.outlook.com>
References: <DM5PR2101MB1095F787A3769192A6B76474AC530@DM5PR2101MB1095.namprd21.prod.outlook.com>
 <87h81wl7vo.fsf@redhat.com>
 <AM0PR0202MB329784C5DA98D4865422A3989A520@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <87bls3ksen.fsf@redhat.com>
 <AM0PR0202MB3297533FFED6DCCC7A8482189A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <875zibkmyq.fsf@redhat.com>
 <AM0PR0202MB32976EC81BACF131049410239A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <0e22c4e6-74b9-a50b-df44-83c332457f14@redhat.com>
 <DM5PR2101MB1095E88DD902B679320EB0DFAC2D0@DM5PR2101MB1095.namprd21.prod.outlook.com>
Message-ID: <b220902b-e919-7fb1-7562-536b58c4f130@redhat.com>

Hi Aditya,

> Thanks again to everyone who helped get this change in! I am happy to help backport it as well if it can wait until January.

Thank *you* for figuring this out in the first place. This seems a
rather serious bug for Shenandoah GC (and I'm still a bit surprised how
we haven't seen it yet). I just realized we're also gonna need it in
JDK14. I am not even quite sure what the process for this would be.
We'll figure it out.

Thank you!

Roman


> Thanks,
> Aditya
> 
> -----Original Message-----
> From: Roman Kennke <rkennke at redhat.com> 
> Sent: Friday, December 20, 2019 11:59 AM
> To: Doerr, Martin <martin.doerr at sap.com>; Roland Westrelin <rwestrel at redhat.com>; Aditya Mandaleeka <adityam at microsoft.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
> Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS
> 
> Hi Martin,
> 
>> thanks for reviewing it. Pushed to jdk/jdk.
> 
> Thanks a lot!
> 
>> I guess we'll have to backport it after some testing time.
> 
> Yes, we're gonna need it in 11u and 8u.
> 
> Thanks and have a nice weekend (and xmas, etc, if you're also taking time off)!
> 
> Cheers,
> Roman
> 
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: Roland Westrelin <rwestrel at redhat.com>
>>> Sent: Freitag, 20. Dezember 2019 16:59
>>> To: Doerr, Martin <martin.doerr at sap.com>; Aditya Mandaleeka 
>>> <adityam at microsoft.com>; hotspot compiler <hotspot-compiler- 
>>> dev at openjdk.java.net>
>>> Cc: shenandoah-dev <shenandoah-dev at openjdk.java.net>
>>> Subject: RE: RFR: 8236179: C1 register allocation error with 
>>> T_ADDRESS
>>>
>>>
>>>> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/
>>>
>>> That looks good to me.
>>>
>>> Roland.
>>
> 


From vladimir.kozlov at oracle.com  Fri Dec 20 21:14:54 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Dec 2019 13:14:54 -0800
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <AM6PR02MB534758C9E7CBDCCA6A1193F1EC2D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
 <AM6PR02MB5347FCBF21E9DA1B1542CDABEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <a1a9e366-e019-8872-6f16-dfe6b99fe648@oracle.com>
 <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com>
 <AM6PR02MB534758C9E7CBDCCA6A1193F1EC2D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <af9e2610-1421-13b5-8fef-3efc507e6ea0@oracle.com>

On 12/20/19 2:12 AM, Lindenmaier, Goetz wrote:
> Hi Vladimir,
> 
> I refactored it a bit, see new webrev:
> http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/02/

Looks good to me.

> 
> Also, I filed
> 8236414: stringStream allocates on ResourceArea and C-heap
> https://bugs.openjdk.java.net/browse/JDK-8236414
> ... but I'm not sure how to solve it, as stringStream inherits
> this capability, and having the char* on the ResourceArea
> as before is not good, either.
> Anyways, there are very few places where new is used
> with the stringStream, and the PrintInlining implementation
> is the only one where it's problematic. Maybe it would be
> better to simplify PrintInlining.

Yes, simplifying PrintInlining is also option.

Thanks,
Vladimir

> 
> Best regards,
>    Goetz.
> 
>> -----Original Message-----
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Sent: Donnerstag, 19. Dezember 2019 19:07
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-compiler-
>> dev at openjdk.java.net
>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>> dev at openjdk.java.net>
>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>> '8224193: stringStream should not use Resouce Area'.
>>
>> Please, file RFE for refactoring stringStream.
>>
>> Yes, the fix can go into JDK 14.
>>
>> But before that, I see the same pattern used 3 times in compile.cpp:
>>
>> +    if (_print_inlining_stream != NULL) _print_inlining_stream-
>>> ~stringStream();
>>        _print_inlining_stream = <something>;
>>
>> Can you use one function for that? Also our coding style requires to put body of
>> 'if' on separate line and use {}.
>>
>> thanks,
>> Vladimir
>>
>> On 12/19/19 4:52 AM, David Holmes wrote:
>>> On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote:
>>>> Hi,
>>>>
>>>> yes, it is confusing that parts are on the arena, other parts
>>>> are allocated in the C-heap.
>>>> But usages which allocate the stringStream with new() are
>>>> rare, usually it's allocated on the stack making all this
>>>> more simple.? And the previous design was even more
>>>> error-prone.
>>>> Also, the whole way to print the inlining information
>>>> is quite complex, with strange usage of the copy constructor
>>>> of PrintInliningBuffer ... which reaches into GrowableArray
>>>> which should have a constructor that does not use the
>>>> copy constructor to initialize the elements ...
>>>>
>>>> I do not intend to change stringStream in this change.
>>>> So can I consider this reviewed from your side? Or at
>>>> least that there is no veto :)?
>>>
>>> Sorry I was trying to convey this is Reviewed, but I do think this needs further
>> work in the future.
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks and best regards,
>>>>  ?? Goetz.
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>> Sent: Donnerstag, 19. Dezember 2019 11:39
>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vladimir Kozlov
>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>>>>> dev at openjdk.java.net>
>>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>>>>> '8224193: stringStream should not use Resouce Area'.
>>>>>
>>>>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
>>>>>> Hi David, Vladimir,
>>>>>>
>>>>>> stringStream is a ResourceObj, thus it lives on an arena.
>>>>>> This is uncritical, as it does not resize.
>>>>>> 8224193 only changed the allocation of the internal char*,
>>>>>> which always caused problems with resizing under
>>>>>> ResourceMarks that were not placed for the string but to
>>>>>> free other memory.
>>>>>> Thus stringStream must not be deallocated, and
>>>>>> also there was no mem leak before that change.
>>>>>> But we need to call the destructor to free the char*.
>>>>>
>>>>> I think we have a confusing mix of arena and C_heap usage with
>>>>> stringStream. Not clear to me why stringStream remains a resourceObj
>>>>> now? In many cases the stringStream is just local on the stack. In other
>>>>> cases if it is new'd then it should be C-heap same as the array and then
>>>>> you could delete it too.
>>>>>
>>>>> What you have may suffice to initially address the leak but I think this
>>>>> whole thing needs revisiting.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>>  ??? Goetz.
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>> bounces at openjdk.java.net>
>>>>>>> On Behalf Of David Holmes
>>>>>>> Sent: Mittwoch, 18. Dezember 2019 04:33
>>>>>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-
>>>>>>> dev at openjdk.java.net
>>>>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
>>>>>>> dev at openjdk.java.net>
>>>>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
>>>>>>> '8224193: stringStream should not use Resouce Area'.
>>>>>>>
>>>>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
>>>>>>>> CCing to Runtime group.
>>>>>>>>
>>>>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not
>> obvious.
>>>>>>>> I would definitively miss to do that if I use stringStreams in some new
>>>>>>>> code.
>>>>>>>
>>>>>>> But that is not a problem added by this changeset, the problem is that
>>>>>>> we're not deallocating these stringStreams even though we should be.? If
>>>>>>> you use a stringStream in new code you have to manage its lifecycle.
>>>>>>>
>>>>>>> That said why is this:
>>>>>>>
>>>>>>>  ??? if (_print_inlining_stream != NULL)
>>>>>>> _print_inlining_stream->~stringStream();
>>>>>>>
>>>>>>> not just:
>>>>>>>
>>>>>>> delete _print_inlining_stream;
>>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we
>> explicitly
>>>>>>> calling the destructor rather than calling delete?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>>> May be someone can suggest some C++ trick to do that automatically.
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm resending this with fixed bugId ...
>>>>>>>>> Sorry!
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>  ? ?? Goetz
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with
>> new
>>>>> and
>>>>>>>>>> relied on the fact that all memory used is on the ResourceArea
>> cleaned
>>>>>>>>>> after the compilation.
>>>>>>>>>>
>>>>>>>>>> Since 8224193 the char* of the stringStream is malloced and thus
>>>>>>>>>> must be freed. No doing so manifests a memory leak.
>>>>>>>>>> This is only relevant if the corresponding tracing is active.
>>>>>>>>>>
>>>>>>>>>> To fix TraceLoopPredicate I added the destructor call
>>>>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several
>>>>>>>>>> stringStreams. A row of them is in a GrowableArray which must
>>>>>>>>>> be walked to free all of them.
>>>>>>>>>> As the GrowableArray is on an arena no destructor is called for it.
>>>>>>>>>>
>>>>>>>>>> I also changed some as_string() calls to base() calls which reduced
>>>>>>>>>> memory need of the traces, and added a comment explaining the
>>>>>>>>>> constructor of GrowableArray that calls the copyconstructor for its
>>>>>>>>>> elements.
>>>>>>>>>>
>>>>>>>>>> Please review:
>>>>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
>>>>>>> c2_tracing_mem_leak/01/
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>  ? ?? Goetz.
>>>>>>>>>

From vladimir.kozlov at oracle.com  Fri Dec 20 21:36:07 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Dec 2019 13:36:07 -0800
Subject: RFR(XXS): clean up BarrierSet headers in c1_LIRAssembler
In-Reply-To: <CAHPdc0n_Sm9Th3iC1bD9T90RCdoQbXpkE0qto-7mP4azzr+pEg@mail.gmail.com>
References: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com>
 <AM0PR0202MB3297939D356AB3F16E9332A29A2D0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <CAHPdc0n_Sm9Th3iC1bD9T90RCdoQbXpkE0qto-7mP4azzr+pEg@mail.gmail.com>
Message-ID: <2b56976e-a705-ce77-4dee-aa3611c83f67@oracle.com>

Hi Lx

First, when you post RFR, please, include bug id in email's Subject.

You can use jdk/submit testing to verify builds on SPARC. We are still building on it, with warning.
Changes seems fine to me but make sure verify that it builds without PCH.

Regards,
Vladimir

On 12/20/19 9:24 AM, Liu Xin wrote:
> Martin?
> 
> Thank you very much. May I know how to validate Sparc? I don't have any
> SPARC machine to access.
> 
> Thanks,
> --lx
> 
> 
> On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin <martin.doerr at sap.com> wrote:
> 
>> Hi lx,
>>
>> PPC and s390 parts are ok.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>>> bounces at openjdk.java.net> On Behalf Of Liu, Xin
>>> Sent: Freitag, 20. Dezember 2019 04:48
>>> To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
>>> dev at openjdk.java.net>
>>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in
>>> c1_LIRAssembler
>>>
>>> Hi, Reviewers,
>>>
>>> Could you take a look at my webrev? I feel that those barrisetSet
>> interfaces
>>> have nothing with c1_LIRAssembler.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
>>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/
>>>
>>> I try to build on aarch64 and x86_64 and it?s fine.
>>>
>>> Thanks,
>>> --lx
>>
>>

From vladimir.kozlov at oracle.com  Fri Dec 20 21:42:31 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Dec 2019 13:42:31 -0800
Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly
 assigned upper bank xmm registers after Generic Operands (JDK-8234391)
In-Reply-To: <BYAPR11MB29830ABB1E1EDAFE9D68D330EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <BYAPR11MB2983E2D5E975EB35FE8126B9EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
 <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com>
 <BYAPR11MB29830ABB1E1EDAFE9D68D330EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <cd7f67df-e257-2823-a256-ec958b6692f9@oracle.com>

On 12/20/19 11:35 AM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Thanks for the review. Please find the updated webrev below for JDK 14:
> JBS: https://bugs.openjdk.java.net/browse/JDK-8236364
> Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.01/

Looks good.

Thanks,
Vladimir K

> 
> 
> RFE filed for JDK 15:
> https://bugs.openjdk.java.net/browse/JDK-8236446
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Friday, December 20, 2019 2:20 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391)
> 
> Hi Sandhya,
> 
>> Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.00/
> 
> I'd prefer to see the check as a special case with a comment:
> 
>    MachOper* Matcher::specialize_generic_vector_operand(MachOper*
> generic_opnd, uint ideal_reg) {
>      assert(Matcher::is_generic_vector(generic_opnd), "not generic");
>      bool legacy = (generic_opnd->opcode() == LEGVEC);
> +  if (!VM_Version::supports_avx512vlbwdq() && // KNL
> +      is_temp && !legacy && (ideal_reg == Op_VecZ)) {
> +    // Conservatively specialize 512bit vec TEMP operands to legVecZ
> (zmm0-15) on KNL.
> +    return new legVecZOper();
> +  }
>      if (legacy) {
>        switch (ideal_reg) {
>          case Op_VecS: return new legVecSOper();
> 
> Otherwise, looks good.
> 
> I consider it as a stop-the-gap solution for 14. In 15 we need to get rid of it and adjust TEMP operand types in x86.ad instead. Please, file an RFE for it.
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> Best Regards,
>> Sandhya
>>
>>
>>

From vladimir.x.ivanov at oracle.com  Fri Dec 20 21:51:12 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 21 Dec 2019 00:51:12 +0300
Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly
 assigned upper bank xmm registers after Generic Operands (JDK-8234391)
In-Reply-To: <BYAPR11MB29830ABB1E1EDAFE9D68D330EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
References: <BYAPR11MB2983E2D5E975EB35FE8126B9EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
 <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com>
 <BYAPR11MB29830ABB1E1EDAFE9D68D330EF2D0@BYAPR11MB2983.namprd11.prod.outlook.com>
Message-ID: <242a71d4-093e-513b-44d9-0fa119a86dd1@oracle.com>


> Webrev:http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.01/

Looks good.

> RFE filed for JDK 15:
> https://bugs.openjdk.java.net/browse/JDK-8236446

Thanks!

Best regards,
Vladimir Ivanov

From smita.kamath at intel.com  Fri Dec 20 21:52:17 2019
From: smita.kamath at intel.com (Kamath, Smita)
Date: Fri, 20 Dec 2019 21:52:17 +0000
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
 <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>
 <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com>
Message-ID: <SN6PR11MB3134C5857F7C451FB19E35C2992D0@SN6PR11MB3134.namprd11.prod.outlook.com>

Hi Vladimir,

Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below).
Link to the updated webrev: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/

Regards,
Smita

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Thursday, December 19, 2019 5:17 PM
To: Kamath, Smita <smita.kamath at intel.com>
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64

We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers:

StubRoutines::_bigIntegerRightShiftWorker
StubRoutines::_bigIntegerLeftShiftWorker
Smita>>>done

In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch).
Smita>>>done

Thanks,
Vladimir

[1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/

On 12/18/19 6:33 PM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> I have made the code changes you suggested (please look at the email below).
> I have also enabled the intrinsic to run only when VBMI2 feature is available.
> The intrinsic shows gains of >1.5x above 4k bit BigInteger.
> 
> Webrev link: 
> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/
> 
> Thanks,
> Smita
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Wednesday, December 11, 2019 10:55 AM
> To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler' 
> <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya 
> <sandhya.viswanathan at intel.com>
> Subject: Re: RFR(M):8167065: Add intrinsic support for double 
> precision shifting on x86_64
> 
> Hi Kamath,
> 
> First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2.
> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.
> 
> Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?
> 
> Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing.
> 
> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert.
> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.
> Smita >>>done
> 
> I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset.
> Smita >> removed UseVBMI2 flag
> 
> In vm_version_x86.cpp you need to add more %s in print statement for new output.
> Smita  >>> done
> 
> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.
> Smita >>> done
> 
> You need to add intrinsics to Graal's test to ignore them:
> 
> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.
> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/gr
> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416
> Smita >>>done
> 
> Thanks,
> Vladimir
> 
> On 12/10/19 5:41 PM, Kamath, Smita wrote:
>> Hi,
>>
>>
>> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.
>>
>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
>>
>> Link to webrev :
>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
>>
>>
>>
>> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
>>
>>
>> [1]
>> https://software.intel.com/sites/default/files/managed/39/c5/325462-s
>> d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> 
>> Vol.
>> 2C 5-471)
>>
>> [2]
>> https://software.intel.com/en-us/articles/intel-software-development-
>> e
>> mulator
>>
>>
>> Regards,
>>
>> Smita Kamath
>>

From mikael.vidstedt at oracle.com  Fri Dec 20 21:54:47 2019
From: mikael.vidstedt at oracle.com (Mikael Vidstedt)
Date: Fri, 20 Dec 2019 13:54:47 -0800
Subject: RFR(T): 8236449: Problem list
 compiler/jsr292/ContinuousCallSiteTargetChange.java on solaris-sparcv9
Message-ID: <A3248B15-1819-465C-AD61-C28C6A20E6B4@oracle.com>


compiler/jsr292/ContinuousCallSiteTargetChange.java is timing out intermittently on solaris-sparcv9, frequently enough to cause some unwanted noise. Let?s problem list it:

Bug: https://bugs.openjdk.java.net/browse/JDK-8236449 <https://bugs.openjdk.java.net/browse/JDK-8236449>
Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8236449/webrev.00/open/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8236449/webrev.00/open/webrev/>

Cheers,
Mikael

From igor.ignatyev at oracle.com  Fri Dec 20 22:02:20 2019
From: igor.ignatyev at oracle.com (Igor Ignatev)
Date: Fri, 20 Dec 2019 14:02:20 -0800
Subject: RFR(T): 8236449: Problem list
 compiler/jsr292/ContinuousCallSiteTargetChange.java on solaris-sparcv9
In-Reply-To: <A3248B15-1819-465C-AD61-C28C6A20E6B4@oracle.com>
References: <A3248B15-1819-465C-AD61-C28C6A20E6B4@oracle.com>
Message-ID: <35692795-874C-40DA-8789-7FE449E7B906@oracle.com>

LGTM

? Igor

> On Dec 20, 2019, at 1:55 PM, Mikael Vidstedt <mikael.vidstedt at oracle.com> wrote:
> 
> ?
> compiler/jsr292/ContinuousCallSiteTargetChange.java is timing out intermittently on solaris-sparcv9, frequently enough to cause some unwanted noise. Let?s problem list it:
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8236449 <https://bugs.openjdk.java.net/browse/JDK-8236449>
> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8236449/webrev.00/open/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8236449/webrev.00/open/webrev/>
> 
> Cheers,
> Mikael


From vladimir.kozlov at oracle.com  Fri Dec 20 22:10:02 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Dec 2019 14:10:02 -0800
Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll
 loops
In-Reply-To: <8736dfkko7.fsf@redhat.com>
References: <87o8w8ptsq.fsf@redhat.com>
 <da3150a6-8303-894c-be9c-c90298f7de7d@oracle.com> <8736dfkko7.fsf@redhat.com>
Message-ID: <7f4bcc83-e049-20ad-deef-6c5e15819d10@oracle.com>

Looks good.

Thanks,
Vladimir K

On 12/20/19 8:48 AM, Roland Westrelin wrote:
> 
> Thanks for reviewing this, Vladimir.
> 
>> cfgnode.cpp - should we also check for is_top() to set `doit = false` and bailout?
> 
> This:
> 
> http://cr.openjdk.java.net/~roland/8231291/webrev.02/
> 
>> Why you added check in must_be_not_null()? It is used only in library_call.cpp and should not relate to this changes.
>> Did you find some issues?
> 
> I hit a crash when doing some CTW testing because an argument to
> must_be_not_null() was already known to be not null. I suppose it's
> unrelated to that change and something changed in the libraries instead.
> 
> Roland.
> 

From vladimir.kozlov at oracle.com  Fri Dec 20 22:19:54 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Dec 2019 14:19:54 -0800
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <SN6PR11MB3134C5857F7C451FB19E35C2992D0@SN6PR11MB3134.namprd11.prod.outlook.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
 <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>
 <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com>
 <SN6PR11MB3134C5857F7C451FB19E35C2992D0@SN6PR11MB3134.namprd11.prod.outlook.com>
Message-ID: <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com>

We should have added core-libs to review since you modified BigInteger.java.

webrev02 looks good to me. Let me test it.

Thanks,
Vladimir

On 12/20/19 1:52 PM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below).
> Link to the updated webrev: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/
> 
> Regards,
> Smita
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Thursday, December 19, 2019 5:17 PM
> To: Kamath, Smita <smita.kamath at intel.com>
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64
> 
> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers:
> 
> StubRoutines::_bigIntegerRightShiftWorker
> StubRoutines::_bigIntegerLeftShiftWorker
> Smita>>>done
> 
> In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch).
> Smita>>>done
> 
> Thanks,
> Vladimir
> 
> [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/
> 
> On 12/18/19 6:33 PM, Kamath, Smita wrote:
>> Hi Vladimir,
>>
>> I have made the code changes you suggested (please look at the email below).
>> I have also enabled the intrinsic to run only when VBMI2 feature is available.
>> The intrinsic shows gains of >1.5x above 4k bit BigInteger.
>>
>> Webrev link:
>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/
>>
>> Thanks,
>> Smita
>>
>> -----Original Message-----
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Sent: Wednesday, December 11, 2019 10:55 AM
>> To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler'
>> <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya
>> <sandhya.viswanathan at intel.com>
>> Subject: Re: RFR(M):8167065: Add intrinsic support for double
>> precision shifting on x86_64
>>
>> Hi Kamath,
>>
>> First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2.
>> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.
>>
>> Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?
>>
>> Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing.
>>
>> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert.
>> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.
>> Smita >>>done
>>
>> I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset.
>> Smita >> removed UseVBMI2 flag
>>
>> In vm_version_x86.cpp you need to add more %s in print statement for new output.
>> Smita  >>> done
>>
>> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.
>> Smita >>> done
>>
>> You need to add intrinsics to Graal's test to ignore them:
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.
>> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/gr
>> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416
>> Smita >>>done
>>
>> Thanks,
>> Vladimir
>>
>> On 12/10/19 5:41 PM, Kamath, Smita wrote:
>>> Hi,
>>>
>>>
>>> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.
>>>
>>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
>>>
>>> Link to webrev :
>>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
>>>
>>>
>>>
>>> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
>>>
>>>
>>> [1]
>>> https://software.intel.com/sites/default/files/managed/39/c5/325462-s
>>> d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv ->
>>> Vol.
>>> 2C 5-471)
>>>
>>> [2]
>>> https://software.intel.com/en-us/articles/intel-software-development-
>>> e
>>> mulator
>>>
>>>
>>> Regards,
>>>
>>> Smita Kamath
>>>

From vladimir.x.ivanov at oracle.com  Fri Dec 20 22:22:13 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 21 Dec 2019 01:22:13 +0300
Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll
 loops
In-Reply-To: <8736dfkko7.fsf@redhat.com>
References: <87o8w8ptsq.fsf@redhat.com>
 <da3150a6-8303-894c-be9c-c90298f7de7d@oracle.com> <8736dfkko7.fsf@redhat.com>
Message-ID: <7bc32dc2-1612-cb77-a467-ac67987b148c@oracle.com>


>> Why you added check in must_be_not_null()? It is used only in library_call.cpp and should not relate to this changes.
>> Did you find some issues?
> 
> I hit a crash when doing some CTW testing because an argument to
> must_be_not_null() was already known to be not null. I suppose it's
> unrelated to that change and something changed in the libraries instead.

That's interesting. I initially thought it's just to simplify the IR and 
avoid keeping redundant null check until loop opts are over.

Doesn't it signal about a more general problem with 
GraphKit::must_be_not_null()?

What if a value becomes non-null after GraphKit::must_be_not_null()? Is 
there a chance to hit the very same problem you observed?

Best regards,
Vladimir Ivanov

From vladimir.kozlov at oracle.com  Fri Dec 20 23:47:21 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Dec 2019 15:47:21 -0800
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
 <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>
 <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com>
 <SN6PR11MB3134C5857F7C451FB19E35C2992D0@SN6PR11MB3134.namprd11.prod.outlook.com>
 <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com>
Message-ID: <c8167b21-072b-7987-8b55-045f22feb577@oracle.com>

Hi Smita,

You have typo (should be supports_vbmi2):

src/hotspot/cpu/x86/assembler_x86.cpp:6547:22: error: 'support_vbmi2' is not a member of 'VM_Version'
     assert(VM_Version::support_vbmi2(), "requires vbmi2");
                        ^~~~~~~~~~~~~

Debug build failed. I am retesting with local fix.

Regards,
Vladimir K

On 12/20/19 2:19 PM, Vladimir Kozlov wrote:
> We should have added core-libs to review since you modified BigInteger.java.
> 
> webrev02 looks good to me. Let me test it.
> 
> Thanks,
> Vladimir
> 
> On 12/20/19 1:52 PM, Kamath, Smita wrote:
>> Hi Vladimir,
>>
>> Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below).
>> Link to the updated webrev: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/
>>
>> Regards,
>> Smita
>>
>> -----Original Message-----
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Sent: Thursday, December 19, 2019 5:17 PM
>> To: Kamath, Smita <smita.kamath at intel.com>
>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64
>>
>> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers:
>>
>> StubRoutines::_bigIntegerRightShiftWorker
>> StubRoutines::_bigIntegerLeftShiftWorker
>> Smita>>>done
>>
>> In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch).
>> Smita>>>done
>>
>> Thanks,
>> Vladimir
>>
>> [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/
>>
>> On 12/18/19 6:33 PM, Kamath, Smita wrote:
>>> Hi Vladimir,
>>>
>>> I have made the code changes you suggested (please look at the email below).
>>> I have also enabled the intrinsic to run only when VBMI2 feature is available.
>>> The intrinsic shows gains of >1.5x above 4k bit BigInteger.
>>>
>>> Webrev link:
>>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/
>>>
>>> Thanks,
>>> Smita
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> Sent: Wednesday, December 11, 2019 10:55 AM
>>> To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler'
>>> <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya
>>> <sandhya.viswanathan at intel.com>
>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double
>>> precision shifting on x86_64
>>>
>>> Hi Kamath,
>>>
>>> First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code 
>>> generated by C2.
>>> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.
>>>
>>> Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you 
>>> change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?
>>>
>>> Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we 
>>> would need test this including performance testing.
>>>
>>> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert.
>>> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use 
>>> supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.
>>> Smita >>>done
>>>
>>> I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or 
>>> other avx512 instructions subset.
>>> Smita >> removed UseVBMI2 flag
>>>
>>> In vm_version_x86.cpp you need to add more %s in print statement for new output.
>>> Smita? >>> done
>>>
>>> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.
>>> Smita >>> done
>>>
>>> You need to add intrinsics to Graal's test to ignore them:
>>>
>>> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.
>>> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/gr
>>> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416
>>> Smita >>>done
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 12/10/19 5:41 PM, Kamath, Smita wrote:
>>>> Hi,
>>>>
>>>>
>>>> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I 
>>>> would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This 
>>>> optimization is for x86_64 architecture enabled.
>>>>
>>>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
>>>>
>>>> Link to webrev :
>>>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
>>>>
>>>>
>>>>
>>>> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly 
>>>> implemented.
>>>>
>>>>
>>>> [1]
>>>> https://software.intel.com/sites/default/files/managed/39/c5/325462-s
>>>> d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv ->
>>>> Vol.
>>>> 2C 5-471)
>>>>
>>>> [2]
>>>> https://software.intel.com/en-us/articles/intel-software-development-
>>>> e
>>>> mulator
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Smita Kamath
>>>>

From vladimir.kozlov at oracle.com  Sat Dec 21 05:12:15 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Dec 2019 21:12:15 -0800
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <c8167b21-072b-7987-8b55-045f22feb577@oracle.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
 <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>
 <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com>
 <SN6PR11MB3134C5857F7C451FB19E35C2992D0@SN6PR11MB3134.namprd11.prod.outlook.com>
 <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com>
 <c8167b21-072b-7987-8b55-045f22feb577@oracle.com>
Message-ID: <f0bb8dd7-8604-805e-23ef-8c9ab0b8bb63@oracle.com>

Testing results are good after fixing the typo.

We should consider implementing this intrinsic in Graal too. We have to upload AOT and Graal test changes anyway.

Thanks,
Vladimir

On 12/20/19 3:47 PM, Vladimir Kozlov wrote:
> Hi Smita,
> 
> You have typo (should be supports_vbmi2):
> 
> src/hotspot/cpu/x86/assembler_x86.cpp:6547:22: error: 'support_vbmi2' is not a member of 'VM_Version'
>  ??? assert(VM_Version::support_vbmi2(), "requires vbmi2");
>  ?????????????????????? ^~~~~~~~~~~~~
> 
> Debug build failed. I am retesting with local fix.
> 
> Regards,
> Vladimir K
> 
> On 12/20/19 2:19 PM, Vladimir Kozlov wrote:
>> We should have added core-libs to review since you modified BigInteger.java.
>>
>> webrev02 looks good to me. Let me test it.
>>
>> Thanks,
>> Vladimir
>>
>> On 12/20/19 1:52 PM, Kamath, Smita wrote:
>>> Hi Vladimir,
>>>
>>> Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below).
>>> Link to the updated webrev: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/
>>>
>>> Regards,
>>> Smita
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> Sent: Thursday, December 19, 2019 5:17 PM
>>> To: Kamath, Smita <smita.kamath at intel.com>
>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64
>>>
>>> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers:
>>>
>>> StubRoutines::_bigIntegerRightShiftWorker
>>> StubRoutines::_bigIntegerLeftShiftWorker
>>> Smita>>>done
>>>
>>> In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch).
>>> Smita>>>done
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/
>>>
>>> On 12/18/19 6:33 PM, Kamath, Smita wrote:
>>>> Hi Vladimir,
>>>>
>>>> I have made the code changes you suggested (please look at the email below).
>>>> I have also enabled the intrinsic to run only when VBMI2 feature is available.
>>>> The intrinsic shows gains of >1.5x above 4k bit BigInteger.
>>>>
>>>> Webrev link:
>>>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/
>>>>
>>>> Thanks,
>>>> Smita
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>>> Sent: Wednesday, December 11, 2019 10:55 AM
>>>> To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler'
>>>> <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya
>>>> <sandhya.viswanathan at intel.com>
>>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double
>>>> precision shifting on x86_64
>>>>
>>>> Hi Kamath,
>>>>
>>>> First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code 
>>>> generated by C2.
>>>> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.
>>>>
>>>> Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does 
>>>> you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?
>>>>
>>>> Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we 
>>>> would need test this including performance testing.
>>>>
>>>> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert.
>>>> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use 
>>>> supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.
>>>> Smita >>>done
>>>>
>>>> I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or 
>>>> other avx512 instructions subset.
>>>> Smita >> removed UseVBMI2 flag
>>>>
>>>> In vm_version_x86.cpp you need to add more %s in print statement for new output.
>>>> Smita? >>> done
>>>>
>>>> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.
>>>> Smita >>> done
>>>>
>>>> You need to add intrinsics to Graal's test to ignore them:
>>>>
>>>> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.
>>>> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/gr
>>>> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416
>>>> Smita >>>done
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 12/10/19 5:41 PM, Kamath, Smita wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I 
>>>>> would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This 
>>>>> optimization is for x86_64 architecture enabled.
>>>>>
>>>>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
>>>>>
>>>>> Link to webrev :
>>>>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
>>>>>
>>>>>
>>>>>
>>>>> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly 
>>>>> implemented.
>>>>>
>>>>>
>>>>> [1]
>>>>> https://software.intel.com/sites/default/files/managed/39/c5/325462-s
>>>>> d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv ->
>>>>> Vol.
>>>>> 2C 5-471)
>>>>>
>>>>> [2]
>>>>> https://software.intel.com/en-us/articles/intel-software-development-
>>>>> e
>>>>> mulator
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Smita Kamath
>>>>>

From goetz.lindenmaier at sap.com  Sat Dec 21 08:14:57 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Sat, 21 Dec 2019 08:14:57 +0000
Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193:
 stringStream should not use Resouce Area'.
In-Reply-To: <af9e2610-1421-13b5-8fef-3efc507e6ea0@oracle.com>
References: <AM6PR02MB5347B7BC21922F97937B4D45EC510@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com>
 <abbb97c5-b111-9e81-0cb3-5fb7a05f6ac3@oracle.com>
 <AM6PR02MB534777D67D4719FFF1780E3AEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <dcf7d995-3705-a086-962d-031fca529176@oracle.com>
 <AM6PR02MB5347FCBF21E9DA1B1542CDABEC520@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <a1a9e366-e019-8872-6f16-dfe6b99fe648@oracle.com>
 <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com>
 <AM6PR02MB534758C9E7CBDCCA6A1193F1EC2D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <af9e2610-1421-13b5-8fef-3efc507e6ea0@oracle.com>
Message-ID: <AM6PR02MB534764F4F7027419A8F8495FEC2C0@AM6PR02MB5347.eurprd02.prod.outlook.com>

Thanks Vladimir!

Best regards,
 Goetz.

> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Friday, December 20, 2019 10:15 PM
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-compiler-
> dev at openjdk.java.net
> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> dev at openjdk.java.net>
> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> '8224193: stringStream should not use Resouce Area'.
> 
> On 12/20/19 2:12 AM, Lindenmaier, Goetz wrote:
> > Hi Vladimir,
> >
> > I refactored it a bit, see new webrev:
> > http://cr.openjdk.java.net/~goetz/wr19/8235998-
> c2_tracing_mem_leak/02/
> 
> Looks good to me.
> 
> >
> > Also, I filed
> > 8236414: stringStream allocates on ResourceArea and C-heap
> > https://bugs.openjdk.java.net/browse/JDK-8236414
> > ... but I'm not sure how to solve it, as stringStream inherits
> > this capability, and having the char* on the ResourceArea
> > as before is not good, either.
> > Anyways, there are very few places where new is used
> > with the stringStream, and the PrintInlining implementation
> > is the only one where it's problematic. Maybe it would be
> > better to simplify PrintInlining.
> 
> Yes, simplifying PrintInlining is also option.
> 
> Thanks,
> Vladimir
> 
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> >> Sent: Donnerstag, 19. Dezember 2019 19:07
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> compiler-
> >> dev at openjdk.java.net
> >> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-
> >> dev at openjdk.java.net>
> >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> >> '8224193: stringStream should not use Resouce Area'.
> >>
> >> Please, file RFE for refactoring stringStream.
> >>
> >> Yes, the fix can go into JDK 14.
> >>
> >> But before that, I see the same pattern used 3 times in compile.cpp:
> >>
> >> +    if (_print_inlining_stream != NULL) _print_inlining_stream-
> >>> ~stringStream();
> >>        _print_inlining_stream = <something>;
> >>
> >> Can you use one function for that? Also our coding style requires to put
> body of
> >> 'if' on separate line and use {}.
> >>
> >> thanks,
> >> Vladimir
> >>
> >> On 12/19/19 4:52 AM, David Holmes wrote:
> >>> On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote:
> >>>> Hi,
> >>>>
> >>>> yes, it is confusing that parts are on the arena, other parts
> >>>> are allocated in the C-heap.
> >>>> But usages which allocate the stringStream with new() are
> >>>> rare, usually it's allocated on the stack making all this
> >>>> more simple.? And the previous design was even more
> >>>> error-prone.
> >>>> Also, the whole way to print the inlining information
> >>>> is quite complex, with strange usage of the copy constructor
> >>>> of PrintInliningBuffer ... which reaches into GrowableArray
> >>>> which should have a constructor that does not use the
> >>>> copy constructor to initialize the elements ...
> >>>>
> >>>> I do not intend to change stringStream in this change.
> >>>> So can I consider this reviewed from your side? Or at
> >>>> least that there is no veto :)?
> >>>
> >>> Sorry I was trying to convey this is Reviewed, but I do think this needs
> further
> >> work in the future.
> >>>
> >>> Thanks,
> >>> David
> >>>
> >>>> Thanks and best regards,
> >>>>  ?? Goetz.
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>> Sent: Donnerstag, 19. Dezember 2019 11:39
> >>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vladimir
> Kozlov
> >>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net
> >>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-
> runtime-
> >>>>> dev at openjdk.java.net>
> >>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after
> >>>>> '8224193: stringStream should not use Resouce Area'.
> >>>>>
> >>>>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote:
> >>>>>> Hi David, Vladimir,
> >>>>>>
> >>>>>> stringStream is a ResourceObj, thus it lives on an arena.
> >>>>>> This is uncritical, as it does not resize.
> >>>>>> 8224193 only changed the allocation of the internal char*,
> >>>>>> which always caused problems with resizing under
> >>>>>> ResourceMarks that were not placed for the string but to
> >>>>>> free other memory.
> >>>>>> Thus stringStream must not be deallocated, and
> >>>>>> also there was no mem leak before that change.
> >>>>>> But we need to call the destructor to free the char*.
> >>>>>
> >>>>> I think we have a confusing mix of arena and C_heap usage with
> >>>>> stringStream. Not clear to me why stringStream remains a
> resourceObj
> >>>>> now? In many cases the stringStream is just local on the stack. In
> other
> >>>>> cases if it is new'd then it should be C-heap same as the array and
> then
> >>>>> you could delete it too.
> >>>>>
> >>>>> What you have may suffice to initially address the leak but I think this
> >>>>> whole thing needs revisiting.
> >>>>>
> >>>>> Thanks,
> >>>>> David
> >>>>>
> >>>>>>
> >>>>>> Best regards,
> >>>>>>  ??? Goetz.
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
> >>>>> bounces at openjdk.java.net>
> >>>>>>> On Behalf Of David Holmes
> >>>>>>> Sent: Mittwoch, 18. Dezember 2019 04:33
> >>>>>>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-
> compiler-
> >>>>>>> dev at openjdk.java.net
> >>>>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime <hotspot-
> runtime-
> >>>>>>> dev at openjdk.java.net>
> >>>>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing
> after
> >>>>>>> '8224193: stringStream should not use Resouce Area'.
> >>>>>>>
> >>>>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote:
> >>>>>>>> CCing to Runtime group.
> >>>>>>>>
> >>>>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not
> >> obvious.
> >>>>>>>> I would definitively miss to do that if I use stringStreams in some
> new
> >>>>>>>> code.
> >>>>>>>
> >>>>>>> But that is not a problem added by this changeset, the problem is
> that
> >>>>>>> we're not deallocating these stringStreams even though we should
> be.? If
> >>>>>>> you use a stringStream in new code you have to manage its
> lifecycle.
> >>>>>>>
> >>>>>>> That said why is this:
> >>>>>>>
> >>>>>>>  ??? if (_print_inlining_stream != NULL)
> >>>>>>> _print_inlining_stream->~stringStream();
> >>>>>>>
> >>>>>>> not just:
> >>>>>>>
> >>>>>>> delete _print_inlining_stream;
> >>>>>>>
> >>>>>>> ?
> >>>>>>>
> >>>>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we
> >> explicitly
> >>>>>>> calling the destructor rather than calling delete?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> David
> >>>>>>>
> >>>>>>>> May be someone can suggest some C++ trick to do that
> automatically.
> >>>>>>>> Thanks,
> >>>>>>>> Vladimir
> >>>>>>>>
> >>>>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I'm resending this with fixed bugId ...
> >>>>>>>>> Sorry!
> >>>>>>>>>
> >>>>>>>>> Best regards,
> >>>>>>>>>  ? ?? Goetz
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams
> with
> >> new
> >>>>> and
> >>>>>>>>>> relied on the fact that all memory used is on the ResourceArea
> >> cleaned
> >>>>>>>>>> after the compilation.
> >>>>>>>>>>
> >>>>>>>>>> Since 8224193 the char* of the stringStream is malloced and
> thus
> >>>>>>>>>> must be freed. No doing so manifests a memory leak.
> >>>>>>>>>> This is only relevant if the corresponding tracing is active.
> >>>>>>>>>>
> >>>>>>>>>> To fix TraceLoopPredicate I added the destructor call
> >>>>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several
> >>>>>>>>>> stringStreams. A row of them is in a GrowableArray which must
> >>>>>>>>>> be walked to free all of them.
> >>>>>>>>>> As the GrowableArray is on an arena no destructor is called for
> it.
> >>>>>>>>>>
> >>>>>>>>>> I also changed some as_string() calls to base() calls which
> reduced
> >>>>>>>>>> memory need of the traces, and added a comment explaining
> the
> >>>>>>>>>> constructor of GrowableArray that calls the copyconstructor for
> its
> >>>>>>>>>> elements.
> >>>>>>>>>>
> >>>>>>>>>> Please review:
> >>>>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998-
> >>>>>>> c2_tracing_mem_leak/01/
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>>  ? ?? Goetz.
> >>>>>>>>>

From dean.long at oracle.com  Mon Dec 23 07:22:47 2019
From: dean.long at oracle.com (Dean Long)
Date: Sun, 22 Dec 2019 23:22:47 -0800
Subject: [15] Review Request: 8235975 Update copyright year to match last
 edit in jdk repository for 2014/15/16/17/18
In-Reply-To: <1e7d0395-fc57-4d5b-9cfa-c33e0f6462d5@oracle.com>
References: <1e7d0395-fc57-4d5b-9cfa-c33e0f6462d5@oracle.com>
Message-ID: <a8cccdc5-0f0b-6836-93ea-cc35c8e6ec30@oracle.com>

The changes to the src/jdk.internal.vm.compiler tree is going to 
complicate our automated sync with upstream Graal.? Our sync script sets 
the date based on changes in the Graal repo, so a file that was last 
modified in 2018 (in upstream Graal) but was added to JDK in 2019 would 
still have 2018 as the date.

dl

On 12/22/19 12:24 PM, Sergey Bylokhov wrote:
> Hello.
> Please review the fix for JDK 15.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8235975
> Patch (2 Mb): 
> http://cr.openjdk.java.net/~serb/8235975/webrev.02/open.patch
> Fix: http://cr.openjdk.java.net/~serb/8235975/webrev.02
>
> I have updated the source code copyrights by the 
> "update_copyright_year.sh"
> script for 2014/15/16/18/19 years, unfortunately, cannot run it for 2017
> because of: "JDK-8187443: Forest Consolidation: Move files to unified 
> layout"
> which touched all files.
>
>


From Pengfei.Li at arm.com  Mon Dec 23 07:53:52 2019
From: Pengfei.Li at arm.com (Pengfei Li)
Date: Mon, 23 Dec 2019 07:53:52 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <379d9c30-a139-4544-9339-a6ddec913b78@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
 <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
 <DB7PR08MB3115DF046D6974575DDCF8EB962D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <379d9c30-a139-4544-9339-a6ddec913b78@redhat.com>
Message-ID: <DB7PR08MB3115E279959A0AA892390602962E0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

> On 12/20/19 10:21 AM, Pengfei Li wrote:
> > Since Nick's recent metaspace reservation fix [1] has completely removed
> the use of r27, my patch becomes much simpler now. I have removed the
> condition of UseCompressedClassPointers, rebased the code and created a
> new webrev. Could you please help review again?
> 
> What happens when we use Graal as a replacement for C2, particularly when
> Graal needs a heap base register?

Regarding your question, the Graal compiler (particularly on AArch64) uses r27 for compressing and uncompressing oops. Neither compressing nor uncompressing klass pointers uses r27. See code at [1], [2].

In Graal, we also wanted to make the heap base register allocatable when UseCompressedOops is off. Since before, AArch64 HotSpot didn't support r27 as an allocatable register, Graal patch e4d9c5f [3] marked r27 as non-allocatable and JVMCI patch JDK-8231754 [4] reserved r27 unconditionally. That's why I would like to revert JDK-8231754 in my patch.

[1] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotLIRGenerator.java#L256
[2] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotLIRGenerator.java#L286
[3] https://github.com/oracle/graal/commit/e4d9c5f09a3c9be9f3c66ff0feff787519875a12
[4] http://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de

--
Thanks,
Pengfei


From tobias.hartmann at oracle.com  Mon Dec 23 08:42:08 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 23 Dec 2019 09:42:08 +0100
Subject: [14] RFR(S): 8233164: C2 fails with
 assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr))
 failed: correct memory chain
In-Reply-To: <878sn7kn84.fsf@redhat.com>
References: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com>
 <878sn7kn84.fsf@redhat.com>
Message-ID: <8d800afc-4d98-c882-2005-5e3fc1bd69d4@oracle.com>

Hi Roland,

Thanks for the review.

On 20.12.19 16:53, Roland Westrelin wrote:
> I'm wondering whether this problem could show up elsewhere and if a more
> generic fix would be needed (maybe EA should set the type of the CastPP
> so there's no inconsistency when IGVN runs). The fix looks good to
> address this particular problem but I think this deserves a follow up
> bug to investigate this further.

Yes, I agree. As we've discussed offline, I've filed 8236493 [1].

Thanks,
Tobias


[1] https://bugs.openjdk.java.net/browse/JDK-8236493

From christian.hagedorn at oracle.com  Mon Dec 23 09:30:25 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Mon, 23 Dec 2019 10:30:25 +0100
Subject: [14] 8235984: C2: assert(out->in(PhiNode::Region) == head ||
 out->in(PhiNode::Region) == slow_head) failed: phi must be either part of the
 slow or the fast loop
Message-ID: <e985e329-b1c1-dbf0-39ae-a096b89c4c70@oracle.com>

Hi

Please review the following options for:
https://bugs.openjdk.java.net/browse/JDK-8235984

The problem:
The original fix for JDK-8233033 [1] assumed that partially peeled 
statements always have a dependency inside the loop to be unswitched 
(i.e. when following their outputs eventually a phi node is hit 
belonging to either the slow or the fast loop). However, that is not 
always the case. As a result the assert was hit.

I suggest the following options:

Option a) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.a.00/
We first check at the start of loop unswitching if there are loop 
predicates for the loop to be unswitched and if they have an additional 
control dependency to partially peeled statements (outcnt > 1). Then we 
explicitly check the assumption that partially peeled statements have a 
dependency in the loop to be unswitched (in order to keep the fix for 
JDK-8233033). If that is not the case, we bailout. We could then file an 
RFE for JDK-15 to handle this missing case properly and remove the bailout.

Option b) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.b.00/
We only check at the start of loop unswitching if there are loop 
predicates for the loop to be unswitched and if they have an additional 
control dependency to partially peeled statements (outcnt > 1). If 
that's the case we bailout without having the fix from JDK-8233033. We 
could then file an RFE for JDK-15 to properly handle partially peeled 
statements all together (kinda a REDO of the fix for JDK-8233033).

Option c)
Trying to fix the missing cases from JDK-8233033 for JDK-14 without a 
bailout.


I've tried to come up with a fix (option c) last week but without 
success so far. The idea was to also clone the partially peeled 
statements without a dependency in the loop to be unswitched, change 
their control input to the correct cloned predicates and then add a phi 
node on each loop exit, where we merge the slow and fast loop, to select 
the correct value. However, this has not worked properly, yet and also 
involves a higher risk due to its complexity. I think we should not 
target that option for JDK-14 but do it for JDK-15 in an RFE.

Thus, I'd opt for either option a) or b). I tested Tier 1-7 for the 
complete bailout b) and Tier 1-8 for the "conditioned" bailout a). Both 
look fine. I also ran some standard benchmarks comparing a) and b) to a 
baseline where I excluded the fix for JDK-8233033 (without bailing out 
and trying to fix the problem). I could not see any difference in 
performance. Therefore, it suggests to go with the low risk option b) 
for JDK-14 and do the entire fix in an RFE for JDK-15.

What do you think?


Thank you!

Best regards,
Christian


[1] https://bugs.openjdk.java.net/browse/JDK-8233033

From richard.reingruber at sap.com  Mon Dec 23 09:40:52 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Mon, 23 Dec 2019 09:40:52 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
Message-ID: <DB7PR02MB361224E2C0B8932C73C8A24A9B2E0@DB7PR02MB3612.eurprd02.prod.outlook.com>

Hi,

webrev.3 didn't apply anymore after 8236000 [1]. I've rebased and updated in place:

http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/

The change was minimal.

Cheers, Richard.

[1] JDK-8236000: VM build without C2 fails

-----Original Message-----
From: Reingruber, Richard 
Sent: Dienstag, 10. Dezember 2019 22:45
To: serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi,

I would like to get reviews please for

http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/

Corresponding RFE:
https://bugs.openjdk.java.net/browse/JDK-8227745

Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]

Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the
change is being tested at SAP since I posted the first RFR some months ago.

The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI
agents request capabilities that allow them to access local variable values. E.g. if you start-up
with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right
from the beginning, well before a debugger attaches -- if ever one should do so. With the
enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based
optimizations are reverted just before an agent acquires the reference to an object. In the JBS item
you'll find more details.

Thanks,
Richard.

[1] Experimental fix for JDK-8214584 based on JDK-8227745
    http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch

From aph at redhat.com  Mon Dec 23 09:43:16 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 23 Dec 2019 10:43:16 +0100
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB3115E279959A0AA892390602962E0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
 <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>
 <DB7PR08MB3115DF046D6974575DDCF8EB962D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <379d9c30-a139-4544-9339-a6ddec913b78@redhat.com>
 <DB7PR08MB3115E279959A0AA892390602962E0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <57e85dcb-0d1c-135f-27dd-ec81534f0ad6@redhat.com>

On 12/23/19 8:53 AM, Pengfei Li wrote:
> Regarding your question, the Graal compiler (particularly on AArch64) uses r27 for compressing and uncompressing oops. Neither compressing nor uncompressing klass pointers uses r27. See code at [1], [2].
> 
> In Graal, we also wanted to make the heap base register allocatable when UseCompressedOops is off. Since before, AArch64 HotSpot didn't support r27 as an allocatable register, Graal patch e4d9c5f [3] marked r27 as non-allocatable and JVMCI patch JDK-8231754 [4] reserved r27 unconditionally. That's why I would like to revert JDK-8231754 in my patch.
> 
> [1] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotLIRGenerator.java#L256
> [2] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotLIRGenerator.java#L286
> [3] https://github.com/oracle/graal/commit/e4d9c5f09a3c9be9f3c66ff0feff787519875a12
> [4] http://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de

OK. As long as you've fully tested Graal with UseJVMCICompiler after
this patch was applied I'm happy.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From tobias.hartmann at oracle.com  Mon Dec 23 13:12:28 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 23 Dec 2019 14:12:28 +0100
Subject: [14] 8235984: C2: assert(out->in(PhiNode::Region) == head ||
 out->in(PhiNode::Region) == slow_head) failed: phi must be either part of the
 slow or the fast loop
In-Reply-To: <e985e329-b1c1-dbf0-39ae-a096b89c4c70@oracle.com>
References: <e985e329-b1c1-dbf0-39ae-a096b89c4c70@oracle.com>
Message-ID: <08f66862-fd59-feb1-5f3b-a0df81c07721@oracle.com>

Hi Christian,

version b) looks good to me and I would go with that for JDK 14.

Best regards,
Tobias

On 23.12.19 10:30, Christian Hagedorn wrote:
> Hi
> 
> Please review the following options for:
> https://bugs.openjdk.java.net/browse/JDK-8235984
> 
> The problem:
> The original fix for JDK-8233033 [1] assumed that partially peeled statements always have a
> dependency inside the loop to be unswitched (i.e. when following their outputs eventually a phi node
> is hit belonging to either the slow or the fast loop). However, that is not always the case. As a
> result the assert was hit.
> 
> I suggest the following options:
> 
> Option a) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.a.00/
> We first check at the start of loop unswitching if there are loop predicates for the loop to be
> unswitched and if they have an additional control dependency to partially peeled statements (outcnt
>> 1). Then we explicitly check the assumption that partially peeled statements have a dependency in
> the loop to be unswitched (in order to keep the fix for JDK-8233033). If that is not the case, we
> bailout. We could then file an RFE for JDK-15 to handle this missing case properly and remove the
> bailout.
> 
> Option b) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.b.00/
> We only check at the start of loop unswitching if there are loop predicates for the loop to be
> unswitched and if they have an additional control dependency to partially peeled statements (outcnt
>> 1). If that's the case we bailout without having the fix from JDK-8233033. We could then file an
> RFE for JDK-15 to properly handle partially peeled statements all together (kinda a REDO of the fix
> for JDK-8233033).
> 
> Option c)
> Trying to fix the missing cases from JDK-8233033 for JDK-14 without a bailout.
> 
> 
> I've tried to come up with a fix (option c) last week but without success so far. The idea was to
> also clone the partially peeled statements without a dependency in the loop to be unswitched, change
> their control input to the correct cloned predicates and then add a phi node on each loop exit,
> where we merge the slow and fast loop, to select the correct value. However, this has not worked
> properly, yet and also involves a higher risk due to its complexity. I think we should not target
> that option for JDK-14 but do it for JDK-15 in an RFE.
> 
> Thus, I'd opt for either option a) or b). I tested Tier 1-7 for the complete bailout b) and Tier 1-8
> for the "conditioned" bailout a). Both look fine. I also ran some standard benchmarks comparing a)
> and b) to a baseline where I excluded the fix for JDK-8233033 (without bailing out and trying to fix
> the problem). I could not see any difference in performance. Therefore, it suggests to go with the
> low risk option b) for JDK-14 and do the entire fix in an RFE for JDK-15.
> 
> What do you think?
> 
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8233033

From christian.hagedorn at oracle.com  Mon Dec 23 13:20:33 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Mon, 23 Dec 2019 14:20:33 +0100
Subject: [14] 8235984: C2: assert(out->in(PhiNode::Region) == head ||
 out->in(PhiNode::Region) == slow_head) failed: phi must be either part of the
 slow or the fast loop
In-Reply-To: <08f66862-fd59-feb1-5f3b-a0df81c07721@oracle.com>
References: <e985e329-b1c1-dbf0-39ae-a096b89c4c70@oracle.com>
 <08f66862-fd59-feb1-5f3b-a0df81c07721@oracle.com>
Message-ID: <8fcdc2b1-0e5e-2059-d1f9-b8fa84f8ece1@oracle.com>

Hi Tobias

Thank you for your review and your estimate! I also think that option b) 
is probably the best choice for JDK-14.

Best regards,
Christian

On 23.12.19 14:12, Tobias Hartmann wrote:
> Hi Christian,
> 
> version b) looks good to me and I would go with that for JDK 14.
> 
> Best regards,
> Tobias
> 
> On 23.12.19 10:30, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following options for:
>> https://bugs.openjdk.java.net/browse/JDK-8235984
>>
>> The problem:
>> The original fix for JDK-8233033 [1] assumed that partially peeled statements always have a
>> dependency inside the loop to be unswitched (i.e. when following their outputs eventually a phi node
>> is hit belonging to either the slow or the fast loop). However, that is not always the case. As a
>> result the assert was hit.
>>
>> I suggest the following options:
>>
>> Option a) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.a.00/
>> We first check at the start of loop unswitching if there are loop predicates for the loop to be
>> unswitched and if they have an additional control dependency to partially peeled statements (outcnt
>>> 1). Then we explicitly check the assumption that partially peeled statements have a dependency in
>> the loop to be unswitched (in order to keep the fix for JDK-8233033). If that is not the case, we
>> bailout. We could then file an RFE for JDK-15 to handle this missing case properly and remove the
>> bailout.
>>
>> Option b) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.b.00/
>> We only check at the start of loop unswitching if there are loop predicates for the loop to be
>> unswitched and if they have an additional control dependency to partially peeled statements (outcnt
>>> 1). If that's the case we bailout without having the fix from JDK-8233033. We could then file an
>> RFE for JDK-15 to properly handle partially peeled statements all together (kinda a REDO of the fix
>> for JDK-8233033).
>>
>> Option c)
>> Trying to fix the missing cases from JDK-8233033 for JDK-14 without a bailout.
>>
>>
>> I've tried to come up with a fix (option c) last week but without success so far. The idea was to
>> also clone the partially peeled statements without a dependency in the loop to be unswitched, change
>> their control input to the correct cloned predicates and then add a phi node on each loop exit,
>> where we merge the slow and fast loop, to select the correct value. However, this has not worked
>> properly, yet and also involves a higher risk due to its complexity. I think we should not target
>> that option for JDK-14 but do it for JDK-15 in an RFE.
>>
>> Thus, I'd opt for either option a) or b). I tested Tier 1-7 for the complete bailout b) and Tier 1-8
>> for the "conditioned" bailout a). Both look fine. I also ran some standard benchmarks comparing a)
>> and b) to a baseline where I excluded the fix for JDK-8233033 (without bailing out and trying to fix
>> the problem). I could not see any difference in performance. Therefore, it suggests to go with the
>> low risk option b) for JDK-14 and do the entire fix in an RFE for JDK-15.
>>
>> What do you think?
>>
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8233033

From smita.kamath at intel.com  Mon Dec 23 18:25:48 2019
From: smita.kamath at intel.com (Kamath, Smita)
Date: Mon, 23 Dec 2019 18:25:48 +0000
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <f0bb8dd7-8604-805e-23ef-8c9ab0b8bb63@oracle.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
 <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>
 <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com>
 <SN6PR11MB3134C5857F7C451FB19E35C2992D0@SN6PR11MB3134.namprd11.prod.outlook.com>
 <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com>
 <c8167b21-072b-7987-8b55-045f22feb577@oracle.com>
 <f0bb8dd7-8604-805e-23ef-8c9ab0b8bb63@oracle.com>
Message-ID: <SN6PR11MB3134E20D69C02B201581FAAA992E0@SN6PR11MB3134.namprd11.prod.outlook.com>

Hi Vladimir, 

Thanks for reviewing the code. Can you please sponsor and push the changes? 

Regards,
Smita

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Friday, December 20, 2019 9:12 PM
To: Kamath, Smita <smita.kamath at intel.com>
Cc: 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>; core-libs-dev at openjdk.java.net
Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64

Testing results are good after fixing the typo.

We should consider implementing this intrinsic in Graal too. We have to upload AOT and Graal test changes anyway.

Thanks,
Vladimir

On 12/20/19 3:47 PM, Vladimir Kozlov wrote:
> Hi Smita,
> 
> You have typo (should be supports_vbmi2):
> 
> src/hotspot/cpu/x86/assembler_x86.cpp:6547:22: error: 'support_vbmi2' is not a member of 'VM_Version'
>  ??? assert(VM_Version::support_vbmi2(), "requires vbmi2");
>  ?????????????????????? ^~~~~~~~~~~~~
> 
> Debug build failed. I am retesting with local fix.
> 
> Regards,
> Vladimir K
> 
> On 12/20/19 2:19 PM, Vladimir Kozlov wrote:
>> We should have added core-libs to review since you modified BigInteger.java.
>>
>> webrev02 looks good to me. Let me test it.
>>
>> Thanks,
>> Vladimir
>>
>> On 12/20/19 1:52 PM, Kamath, Smita wrote:
>>> Hi Vladimir,
>>>
>>> Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below).
>>> Link to the updated webrev: 
>>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/
>>>
>>> Regards,
>>> Smita
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> Sent: Thursday, December 19, 2019 5:17 PM
>>> To: Kamath, Smita <smita.kamath at intel.com>
>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 'hotspot 
>>> compiler' <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double 
>>> precision shifting on x86_64
>>>
>>> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers:
>>>
>>> StubRoutines::_bigIntegerRightShiftWorker
>>> StubRoutines::_bigIntegerLeftShiftWorker
>>> Smita>>>done
>>>
>>> In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch).
>>> Smita>>>done
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/
>>>
>>> On 12/18/19 6:33 PM, Kamath, Smita wrote:
>>>> Hi Vladimir,
>>>>
>>>> I have made the code changes you suggested (please look at the email below).
>>>> I have also enabled the intrinsic to run only when VBMI2 feature is available.
>>>> The intrinsic shows gains of >1.5x above 4k bit BigInteger.
>>>>
>>>> Webrev link:
>>>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/
>>>>
>>>> Thanks,
>>>> Smita
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>>> Sent: Wednesday, December 11, 2019 10:55 AM
>>>> To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler'
>>>> <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya 
>>>> <sandhya.viswanathan at intel.com>
>>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double 
>>>> precision shifting on x86_64
>>>>
>>>> Hi Kamath,
>>>>
>>>> First, general question. What performance you see when VBMI2 
>>>> instructions are *not* used with your new code vs code generated by C2.
>>>> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.
>>>>
>>>> Second. Sandhya recently pushed 8235510 changes to rollback avx512 
>>>> code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?
>>>>
>>>> Third. I would suggest to wait after we fork JDK 14 with this 
>>>> changes. I think it may be too late for 14 because we would need test this including performance testing.
>>>>
>>>> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert.
>>>> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit 
>>>> when UseAVX < 3 ( < avx512). You can also use
>>>> supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.
>>>> Smita >>>done
>>>>
>>>> I don't think we need separate flag UseVBMI2 - it could be 
>>>> controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset.
>>>> Smita >> removed UseVBMI2 flag
>>>>
>>>> In vm_version_x86.cpp you need to add more %s in print statement for new output.
>>>> Smita? >>> done
>>>>
>>>> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.
>>>> Smita >>> done
>>>>
>>>> You need to add intrinsics to Graal's test to ignore them:
>>>>
>>>> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.
>>>> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org
>>>> /gr
>>>> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416
>>>> Smita >>>done
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 12/10/19 5:41 PM, Kamath, Smita wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> As per Intel Architecture Instruction Set Reference [1] VBMI2 
>>>>> Operations will be supported in future Intel ISA. I would like to 
>>>>> contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.
>>>>>
>>>>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
>>>>>
>>>>> Link to webrev :
>>>>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
>>>>>
>>>>>
>>>>>
>>>>> I ran jtreg test suite with the algorithm on Intel SDE [2] to 
>>>>> confirm that encoding and semantics are correctly implemented.
>>>>>
>>>>>
>>>>> [1]
>>>>> https://software.intel.com/sites/default/files/managed/39/c5/32546
>>>>> 2-s d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and 
>>>>> vpshldv -> Vol.
>>>>> 2C 5-471)
>>>>>
>>>>> [2]
>>>>> https://software.intel.com/en-us/articles/intel-software-developme
>>>>> nt-
>>>>> e
>>>>> mulator
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Smita Kamath
>>>>>

From Sergey.Bylokhov at oracle.com  Tue Dec 24 18:22:15 2019
From: Sergey.Bylokhov at oracle.com (Sergey Bylokhov)
Date: Tue, 24 Dec 2019 21:22:15 +0300
Subject: <AWT Dev> [15] Review Request: 8235975 Update copyright year to
 match last edit in jdk repository for 2014/15/16/17/18
In-Reply-To: <1e7d0395-fc57-4d5b-9cfa-c33e0f6462d5@oracle.com>
References: <1e7d0395-fc57-4d5b-9cfa-c33e0f6462d5@oracle.com>
Message-ID: <3460a6f6-6178-cc45-5840-0f215eebc53f@oracle.com>

Hello.

Here is an updated version:
   Bug: https://bugs.openjdk.java.net/browse/JDK-8235975
   Patch (2 Mb): http://cr.openjdk.java.net/~serb/8235975/webrev.03/open.patch
   Fix: http://cr.openjdk.java.net/~serb/8235975/webrev.03/

  - "jdk.internal.vm.compiler" is removed from the patch.
  - "Aes128CtsHmacSha2EType.java" is updated to "Copyright (c) 2018"

On 12/22/19 11:24 pm, Sergey Bylokhov wrote:
> Hello.
> Please review the fix for JDK 15.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8235975
> Patch (2 Mb): http://cr.openjdk.java.net/~serb/8235975/webrev.02/open.patch
> Fix: http://cr.openjdk.java.net/~serb/8235975/webrev.02
> 
> I have updated the source code copyrights by the "update_copyright_year.sh"
> script for 2014/15/16/18/19 years, unfortunately, cannot run it for 2017
> because of: "JDK-8187443: Forest Consolidation: Move files to unified layout"
> which touched all files.
> 
> 


-- 
Best regards, Sergey.

From xxinliu at amazon.com  Tue Dec 24 20:56:53 2019
From: xxinliu at amazon.com (Liu, Xin)
Date: Tue, 24 Dec 2019 20:56:53 +0000
Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
Message-ID: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com>

I update the subject and switch back the corporation email.  Sorry, I didn?t realize that there?s code of conduct. I will pay more attention on it.

I validate the patch without PCH using --disable-precompiled-headers on both x86_64 and aarch64. Both of them built well.
I still have difficulty to verify SPARC. I try the submit repo but I don?t have permission to push an experimental branch.  May I ask a sponsor submit it on behalf of me?

Thanks,
--lx


Hi Lx

First, when you post RFR, please, include bug id in email's Subject.

You can use jdk/submit testing to verify builds on SPARC. We are still building on it, with warning.
Changes seems fine to me but make sure verify that it builds without PCH.

Regards,
Vladimir

On 12/20/19 9:24 AM, Liu Xin wrote:
> Martin?
>
> Thank you very much. May I know how to validate Sparc? I don't have any
> SPARC machine to access.
>
> Thanks,
> --lx
>
>
> On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:
>
>> Hi lx,
>>
>> PPC and s390 parts are ok.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>>> bounces at openjdk.java.net<mailto:bounces at openjdk.java.net>> On Behalf Of Liu, Xin
>>> Sent: Freitag, 20. Dezember 2019 04:48
>>> To: 'hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>' <hotspot-compiler-
>>> dev at openjdk.java.net<mailto:dev at openjdk.java.net>>
>>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in
>>> c1_LIRAssembler
>>>
>>> Hi, Reviewers,
>>>
>>> Could you take a look at my webrev? I feel that those barrisetSet
>> interfaces
>>> have nothing with c1_LIRAssembler.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
>>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/
>>>
>>> I try to build on aarch64 and x86_64 and it?s fine.
>>>
>>> Thanks,
>>> --lx
>>
>>

From Ningsheng.Jian at arm.com  Wed Dec 25 05:52:03 2019
From: Ningsheng.Jian at arm.com (Ningsheng Jian)
Date: Wed, 25 Dec 2019 05:52:03 +0000
Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
In-Reply-To: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com>
References: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com>
Message-ID: <VE1PR08MB5053E9D17314A06E5549522890280@VE1PR08MB5053.eurprd08.prod.outlook.com>

Hi lx,

I've submitted it for you:

http://hg.openjdk.java.net/jdk/submit/rev/9d61b00d5982

I've also verified aarch64 and arm build locally, and it looks fine. 

Thanks,
Ningsheng

> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net>
> On Behalf Of Liu, Xin
> Sent: Wednesday, December 25, 2019 4:57 AM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
> 
> I update the subject and switch back the corporation email.  Sorry, I didn?t
> realize that there?s code of conduct. I will pay more attention on it.
> 
> I validate the patch without PCH using --disable-precompiled-headers on both
> x86_64 and aarch64. Both of them built well.
> I still have difficulty to verify SPARC. I try the submit repo but I don?t have
> permission to push an experimental branch.  May I ask a sponsor submit it on
> behalf of me?
> 
> Thanks,
> --lx
> 
> 
> 
> 
> Hi Lx
> 
> First, when you post RFR, please, include bug id in email's Subject.
> 
> You can use jdk/submit testing to verify builds on SPARC. We are still building on
> it, with warning.
> Changes seems fine to me but make sure verify that it builds without PCH.
> 
> Regards,
> Vladimir
> 
> On 12/20/19 9:24 AM, Liu Xin wrote:
> > Martin?
> >
> > Thank you very much. May I know how to validate Sparc? I don't have
> > any SPARC machine to access.
> >
> > Thanks,
> > --lx
> >
> >
> > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin
> <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:
> >
> >> Hi lx,
> >>
> >> PPC and s390 parts are ok.
> >>
> >> Best regards,
> >> Martin
> >>
> >>
> >>> -----Original Message-----
> >>> From: hotspot-compiler-dev <hotspot-compiler-dev-
> >>> bounces at openjdk.java.net<mailto:bounces at openjdk.java.net>> On Behalf
> >>> Of Liu, Xin
> >>> Sent: Freitag, 20. Dezember 2019 04:48
> >>> To:
> >>> 'hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at o
> >>> penjdk.java.net>' <hotspot-compiler-
> >>> dev at openjdk.java.net<mailto:dev at openjdk.java.net>>
> >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in
> >>> c1_LIRAssembler
> >>>
> >>> Hi, Reviewers,
> >>>
> >>> Could you take a look at my webrev? I feel that those barrisetSet
> >> interfaces
> >>> have nothing with c1_LIRAssembler.
> >>>
> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
> >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/
> >>>
> >>> I try to build on aarch64 and x86_64 and it?s fine.
> >>>
> >>> Thanks,
> >>> --lx
> >>
> >>

From xxinliu at amazon.com  Fri Dec 27 03:04:30 2019
From: xxinliu at amazon.com (Liu, Xin)
Date: Fri, 27 Dec 2019 03:04:30 +0000
Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
In-Reply-To: <VE1PR08MB5053E9D17314A06E5549522890280@VE1PR08MB5053.eurprd08.prod.outlook.com>
References: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com>
 <VE1PR08MB5053E9D17314A06E5549522890280@VE1PR08MB5053.eurprd08.prod.outlook.com>
Message-ID: <3C45D717-C0BF-40A4-93F5-EDA78FE91350@amazon.com>

Hi, Ningsheng, 
Thank you for submitting the trial repo.  I got the result email and it has passed all 80 tests. 

Here is the updated webrev.  I updated the reviewers.
https://cr.openjdk.java.net/~xliu/8236228/01/webrev/

This is a low-risk patch. As long as  we can compile it, it won't have any side-effect at runtime. 

thanks,
--lx

?On 12/24/19, 9:52 PM, "Ningsheng Jian" <Ningsheng.Jian at arm.com> wrote:

    Hi lx,
    
    I've submitted it for you:
    
    http://hg.openjdk.java.net/jdk/submit/rev/9d61b00d5982
    
    I've also verified aarch64 and arm build locally, and it looks fine. 
    
    Thanks,
    Ningsheng
    
    > -----Original Message-----
    > From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net>
    > On Behalf Of Liu, Xin
    > Sent: Wednesday, December 25, 2019 4:57 AM
    > To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 'hotspot-compiler-
    > dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
    > Subject: Re: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
    > 
    > I update the subject and switch back the corporation email.  Sorry, I didn?t
    > realize that there?s code of conduct. I will pay more attention on it.
    > 
    > I validate the patch without PCH using --disable-precompiled-headers on both
    > x86_64 and aarch64. Both of them built well.
    > I still have difficulty to verify SPARC. I try the submit repo but I don?t have
    > permission to push an experimental branch.  May I ask a sponsor submit it on
    > behalf of me?
    > 
    > Thanks,
    > --lx
    > 
    > 
    > 
    > 
    > Hi Lx
    > 
    > First, when you post RFR, please, include bug id in email's Subject.
    > 
    > You can use jdk/submit testing to verify builds on SPARC. We are still building on
    > it, with warning.
    > Changes seems fine to me but make sure verify that it builds without PCH.
    > 
    > Regards,
    > Vladimir
    > 
    > On 12/20/19 9:24 AM, Liu Xin wrote:
    > > Martin?
    > >
    > > Thank you very much. May I know how to validate Sparc? I don't have
    > > any SPARC machine to access.
    > >
    > > Thanks,
    > > --lx
    > >
    > >
    > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin
    > <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:
    > >
    > >> Hi lx,
    > >>
    > >> PPC and s390 parts are ok.
    > >>
    > >> Best regards,
    > >> Martin
    > >>
    > >>
    > >>> -----Original Message-----
    > >>> From: hotspot-compiler-dev <hotspot-compiler-dev-
    > >>> bounces at openjdk.java.net<mailto:bounces at openjdk.java.net>> On Behalf
    > >>> Of Liu, Xin
    > >>> Sent: Freitag, 20. Dezember 2019 04:48
    > >>> To:
    > >>> 'hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at o
    > >>> penjdk.java.net>' <hotspot-compiler-
    > >>> dev at openjdk.java.net<mailto:dev at openjdk.java.net>>
    > >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in
    > >>> c1_LIRAssembler
    > >>>
    > >>> Hi, Reviewers,
    > >>>
    > >>> Could you take a look at my webrev? I feel that those barrisetSet
    > >> interfaces
    > >>> have nothing with c1_LIRAssembler.
    > >>>
    > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
    > >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/
    > >>>
    > >>> I try to build on aarch64 and x86_64 and it?s fine.
    > >>>
    > >>> Thanks,
    > >>> --lx
    > >>
    > >>
    

From Alan.Bateman at oracle.com  Sat Dec 28 08:22:05 2019
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Sat, 28 Dec 2019 08:22:05 +0000
Subject: RFR(M):8167065: Add intrinsic support for double precision
 shifting on x86_64
In-Reply-To: <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com>
References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com>
 <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com>
 <SN6PR11MB3134FA19A661AD004141C92999520@SN6PR11MB3134.namprd11.prod.outlook.com>
 <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com>
 <SN6PR11MB3134C5857F7C451FB19E35C2992D0@SN6PR11MB3134.namprd11.prod.outlook.com>
 <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com>
Message-ID: <be683ef4-afbe-0d40-bf43-7efa430ad98f@oracle.com>

On 20/12/2019 22:19, Vladimir Kozlov wrote:
> We should have added core-libs to review since you modified 
> BigInteger.java.
>
This adds Objects.checkFromToIndex checks in the middle of several 
supporting methods. Is IOOBE really possible in these cases or are these 
stand in for always-on asserts to ensure the instrinic is never used 
when the preconditions aren't satisfied?

-Alan

From jatin.bhateja at intel.com  Mon Dec 30 03:11:25 2019
From: jatin.bhateja at intel.com (Bhateja, Jatin)
Date: Mon, 30 Dec 2019 03:11:25 +0000
Subject: [14] RFR(S): 8236443 : Issues with specializing vector register
 type for phi operand with generic operands
Message-ID: <A66BBE673E08E1428E3A918AE4D5B32C0A0E2114@BGSMSX106.gar.corp.intel.com>

Hi All,

Please find the patch at following link:-

JBS        : https://bugs.openjdk.java.net/browse/JDK-8236443
Webrev: http://cr.openjdk.java.net/~jbhateja/8236443/webrev.02/

Generic operand processing has been skipped for non-machine nodes (e.g. Phi) since they are skipped by the matcher.
Non-definition operand resolution of a machine node will be able to pull the type information from non-machine node.

Re-organized the code by adding a target specific routine which can be used to return the operand types for special ideal nodes e.g. RShiftCntV/LShiftCntV.
For such nodes, definition machine operand vary for different targets and vector lengths, a type based generic operand resolution
is not possible in such cases.

Best Regards,
Jatin


From hohensee at amazon.com  Mon Dec 30 19:16:55 2019
From: hohensee at amazon.com (Hohensee, Paul)
Date: Mon, 30 Dec 2019 19:16:55 +0000
Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
In-Reply-To: <3C45D717-C0BF-40A4-93F5-EDA78FE91350@amazon.com>
References: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com>
 <VE1PR08MB5053E9D17314A06E5549522890280@VE1PR08MB5053.eurprd08.prod.outlook.com>
 <3C45D717-C0BF-40A4-93F5-EDA78FE91350@amazon.com>
Message-ID: <3CC88DEC-56BE-4FDA-B775-7678F44E252A@amazon.com>

Lgtm. Seems very low risk to me too.

Paul

?On 12/26/19, 7:06 PM, "hotspot-compiler-dev on behalf of Liu, Xin" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of xxinliu at amazon.com> wrote:

    Hi, Ningsheng, 
    Thank you for submitting the trial repo.  I got the result email and it has passed all 80 tests. 
    
    Here is the updated webrev.  I updated the reviewers.
    https://cr.openjdk.java.net/~xliu/8236228/01/webrev/
    
    This is a low-risk patch. As long as  we can compile it, it won't have any side-effect at runtime. 
    
    thanks,
    --lx
    
    On 12/24/19, 9:52 PM, "Ningsheng Jian" <Ningsheng.Jian at arm.com> wrote:
    
        Hi lx,
        
        I've submitted it for you:
        
        http://hg.openjdk.java.net/jdk/submit/rev/9d61b00d5982
        
        I've also verified aarch64 and arm build locally, and it looks fine. 
        
        Thanks,
        Ningsheng
        
        > -----Original Message-----
        > From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net>
        > On Behalf Of Liu, Xin
        > Sent: Wednesday, December 25, 2019 4:57 AM
        > To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 'hotspot-compiler-
        > dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
        > Subject: Re: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
        > 
        > I update the subject and switch back the corporation email.  Sorry, I didn?t
        > realize that there?s code of conduct. I will pay more attention on it.
        > 
        > I validate the patch without PCH using --disable-precompiled-headers on both
        > x86_64 and aarch64. Both of them built well.
        > I still have difficulty to verify SPARC. I try the submit repo but I don?t have
        > permission to push an experimental branch.  May I ask a sponsor submit it on
        > behalf of me?
        > 
        > Thanks,
        > --lx
        > 
        > 
        > 
        > 
        > Hi Lx
        > 
        > First, when you post RFR, please, include bug id in email's Subject.
        > 
        > You can use jdk/submit testing to verify builds on SPARC. We are still building on
        > it, with warning.
        > Changes seems fine to me but make sure verify that it builds without PCH.
        > 
        > Regards,
        > Vladimir
        > 
        > On 12/20/19 9:24 AM, Liu Xin wrote:
        > > Martin?
        > >
        > > Thank you very much. May I know how to validate Sparc? I don't have
        > > any SPARC machine to access.
        > >
        > > Thanks,
        > > --lx
        > >
        > >
        > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin
        > <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:
        > >
        > >> Hi lx,
        > >>
        > >> PPC and s390 parts are ok.
        > >>
        > >> Best regards,
        > >> Martin
        > >>
        > >>
        > >>> -----Original Message-----
        > >>> From: hotspot-compiler-dev <hotspot-compiler-dev-
        > >>> bounces at openjdk.java.net<mailto:bounces at openjdk.java.net>> On Behalf
        > >>> Of Liu, Xin
        > >>> Sent: Freitag, 20. Dezember 2019 04:48
        > >>> To:
        > >>> 'hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at o
        > >>> penjdk.java.net>' <hotspot-compiler-
        > >>> dev at openjdk.java.net<mailto:dev at openjdk.java.net>>
        > >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in
        > >>> c1_LIRAssembler
        > >>>
        > >>> Hi, Reviewers,
        > >>>
        > >>> Could you take a look at my webrev? I feel that those barrisetSet
        > >> interfaces
        > >>> have nothing with c1_LIRAssembler.
        > >>>
        > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
        > >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/
        > >>>
        > >>> I try to build on aarch64 and x86_64 and it?s fine.
        > >>>
        > >>> Thanks,
        > >>> --lx
        > >>
        > >>
        
    
From xxinliu at amazon.com  Mon Dec 30 22:39:33 2019
From: xxinliu at amazon.com (Liu, Xin)
Date: Mon, 30 Dec 2019 22:39:33 +0000
Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
In-Reply-To: <3CC88DEC-56BE-4FDA-B775-7678F44E252A@amazon.com>
References: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com>
 <VE1PR08MB5053E9D17314A06E5549522890280@VE1PR08MB5053.eurprd08.prod.outlook.com>
 <3C45D717-C0BF-40A4-93F5-EDA78FE91350@amazon.com>
 <3CC88DEC-56BE-4FDA-B775-7678F44E252A@amazon.com>
Message-ID: <EBF6337B-2C95-417C-9DB0-6C972EB66251@amazon.com>

Hi, Reviewers, 

Thanks for reviewing it.  Paul is a reviewer.  Martin reviewed PPC and s390 and Ningsheng reviewed Arm&Aarch64. 
Is that good to push?

Thanks,
--lx


?On 12/30/19, 11:16 AM, "Hohensee, Paul" <hohensee at amazon.com> wrote:

    Lgtm. Seems very low risk to me too.
    
    Paul
    
    On 12/26/19, 7:06 PM, "hotspot-compiler-dev on behalf of Liu, Xin" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of xxinliu at amazon.com> wrote:
    
        Hi, Ningsheng, 
        Thank you for submitting the trial repo.  I got the result email and it has passed all 80 tests. 
        
        Here is the updated webrev.  I updated the reviewers.
        https://cr.openjdk.java.net/~xliu/8236228/01/webrev/
        
        This is a low-risk patch. As long as  we can compile it, it won't have any side-effect at runtime. 
        
        thanks,
        --lx
        
        On 12/24/19, 9:52 PM, "Ningsheng Jian" <Ningsheng.Jian at arm.com> wrote:
        
            Hi lx,
            
            I've submitted it for you:
            
            http://hg.openjdk.java.net/jdk/submit/rev/9d61b00d5982
            
            I've also verified aarch64 and arm build locally, and it looks fine. 
            
            Thanks,
            Ningsheng
            
            > -----Original Message-----
            > From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net>
            > On Behalf Of Liu, Xin
            > Sent: Wednesday, December 25, 2019 4:57 AM
            > To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; 'hotspot-compiler-
            > dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
            > Subject: Re: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler
            > 
            > I update the subject and switch back the corporation email.  Sorry, I didn?t
            > realize that there?s code of conduct. I will pay more attention on it.
            > 
            > I validate the patch without PCH using --disable-precompiled-headers on both
            > x86_64 and aarch64. Both of them built well.
            > I still have difficulty to verify SPARC. I try the submit repo but I don?t have
            > permission to push an experimental branch.  May I ask a sponsor submit it on
            > behalf of me?
            > 
            > Thanks,
            > --lx
            > 
            > 
            > 
            > 
            > Hi Lx
            > 
            > First, when you post RFR, please, include bug id in email's Subject.
            > 
            > You can use jdk/submit testing to verify builds on SPARC. We are still building on
            > it, with warning.
            > Changes seems fine to me but make sure verify that it builds without PCH.
            > 
            > Regards,
            > Vladimir
            > 
            > On 12/20/19 9:24 AM, Liu Xin wrote:
            > > Martin?
            > >
            > > Thank you very much. May I know how to validate Sparc? I don't have
            > > any SPARC machine to access.
            > >
            > > Thanks,
            > > --lx
            > >
            > >
            > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin
            > <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:
            > >
            > >> Hi lx,
            > >>
            > >> PPC and s390 parts are ok.
            > >>
            > >> Best regards,
            > >> Martin
            > >>
            > >>
            > >>> -----Original Message-----
            > >>> From: hotspot-compiler-dev <hotspot-compiler-dev-
            > >>> bounces at openjdk.java.net<mailto:bounces at openjdk.java.net>> On Behalf
            > >>> Of Liu, Xin
            > >>> Sent: Freitag, 20. Dezember 2019 04:48
            > >>> To:
            > >>> 'hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at o
            > >>> penjdk.java.net>' <hotspot-compiler-
            > >>> dev at openjdk.java.net<mailto:dev at openjdk.java.net>>
            > >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in
            > >>> c1_LIRAssembler
            > >>>
            > >>> Hi, Reviewers,
            > >>>
            > >>> Could you take a look at my webrev? I feel that those barrisetSet
            > >> interfaces
            > >>> have nothing with c1_LIRAssembler.
            > >>>
            > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228
            > >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/
            > >>>
            > >>> I try to build on aarch64 and x86_64 and it?s fine.
            > >>>
            > >>> Thanks,
            > >>> --lx
            > >>
            > >>