RFR: 8284742: x86: Handle integral division overflow during parsing
Quan Anh Mai
duke at openjdk.java.net
Wed Apr 13 12:56:16 UTC 2022
On Wed, 13 Apr 2022 10:46:27 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi,
>>
>> This patch moves the handling of integral division overflow on x86 from code emission time to parsing time. This allows the compiler to perform more efficient transformations and also aids in achieving better code layout.
>>
>> I also removed the handling for division by 10 in the ad file since it has been handled in `DivLNode::Ideal` already.
>>
>> Thank you very much.
>>
>> Before:
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> IntegerDivMod.testDivide 1024 mixed avgt 5 2394.609 ± 66.460 ns/op
>> IntegerDivMod.testDivide 1024 positive avgt 5 2411.390 ± 136.849 ns/op
>> IntegerDivMod.testDivide 1024 negative avgt 5 2396.826 ± 57.079 ns/op
>> IntegerDivMod.testDivideHoistedDivisor 1024 mixed avgt 5 2121.708 ± 17.194 ns/op
>> IntegerDivMod.testDivideHoistedDivisor 1024 positive avgt 5 2118.761 ± 10.002 ns/op
>> IntegerDivMod.testDivideHoistedDivisor 1024 negative avgt 5 2118.739 ± 22.626 ns/op
>> IntegerDivMod.testDivideKnownPositive 1024 mixed avgt 5 2467.937 ± 24.213 ns/op
>> IntegerDivMod.testDivideKnownPositive 1024 positive avgt 5 2463.659 ± 6.922 ns/op
>> IntegerDivMod.testDivideKnownPositive 1024 negative avgt 5 2480.384 ± 100.979 ns/op
>>
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> LongDivMod.testDivide 1024 mixed avgt 5 8312.558 ± 18.408 ns/op
>> LongDivMod.testDivide 1024 positive avgt 5 8339.077 ± 127.893 ns/op
>> LongDivMod.testDivide 1024 negative avgt 5 8335.792 ± 160.274 ns/op
>> LongDivMod.testDivideHoistedDivisor 1024 mixed avgt 5 7438.914 ± 17.948 ns/op
>> LongDivMod.testDivideHoistedDivisor 1024 positive avgt 5 7550.720 ± 572.387 ns/op
>> LongDivMod.testDivideHoistedDivisor 1024 negative avgt 5 7454.072 ± 70.805 ns/op
>> LongDivMod.testDivideKnownPositive 1024 mixed avgt 5 12120.874 ± 82.832 ns/op
>> LongDivMod.testDivideKnownPositive 1024 positive avgt 5 8898.518 ± 29.827 ns/op
>> LongDivMod.testDivideKnownPositive 1024 negative avgt 5 562.742 ± 2.795 ns/op
>>
>> After:
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> IntegerDivMod.testDivide 1024 mixed avgt 5 2174.521 ± 13.054 ns/op
>> IntegerDivMod.testDivide 1024 positive avgt 5 2172.389 ± 7.721 ns/op
>> IntegerDivMod.testDivide 1024 negative avgt 5 2171.290 ± 12.902 ns/op
>> IntegerDivMod.testDivideHoistedDivisor 1024 mixed avgt 5 2049.926 ± 29.098 ns/op
>> IntegerDivMod.testDivideHoistedDivisor 1024 positive avgt 5 2043.896 ± 11.702 ns/op
>> IntegerDivMod.testDivideHoistedDivisor 1024 negative avgt 5 2045.430 ± 17.232 ns/op
>> IntegerDivMod.testDivideKnownPositive 1024 mixed avgt 5 2281.506 ± 81.440 ns/op
>> IntegerDivMod.testDivideKnownPositive 1024 positive avgt 5 2279.727 ± 21.590 ns/op
>> IntegerDivMod.testDivideKnownPositive 1024 negative avgt 5 2275.898 ± 3.692 ns/op
>>
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> LongDivMod.testDivide 1024 mixed avgt 5 8321.347 ± 93.932 ns/op
>> LongDivMod.testDivide 1024 positive avgt 5 8352.279 ± 213.565 ns/op
>> LongDivMod.testDivide 1024 negative avgt 5 8347.779 ± 203.612 ns/op
>> LongDivMod.testDivideHoistedDivisor 1024 mixed avgt 5 7313.156 ± 113.426 ns/op
>> LongDivMod.testDivideHoistedDivisor 1024 positive avgt 5 7299.939 ± 38.591 ns/op
>> LongDivMod.testDivideHoistedDivisor 1024 negative avgt 5 7313.142 ± 100.068 ns/op
>> LongDivMod.testDivideKnownPositive 1024 mixed avgt 5 9322.654 ± 276.328 ns/op
>> LongDivMod.testDivideKnownPositive 1024 positive avgt 5 8639.404 ± 479.006 ns/op
>> LongDivMod.testDivideKnownPositive 1024 negative avgt 5 564.148 ± 6.009 ns/op
>
> Hi @merykitty , Nice work!
> Target specific IR generation looks interesting approach, but UDivL/UDivI are currently being generated by intrinsification route. Thus a post parsing target lowering stage will ideally be suited.
>
> We can also take an alternative approach to generate separate matcher rules for both the control paths by way of setting an attribute in IR node during Identity transformation.
> https://github.com/openjdk/jdk/pull/7572#discussion_r813918734
@jatin-bhateja Thanks a lot for your suggestions. The transformation manipulates the control flow so it should be handled during parsing since the control edge may have been lost right after that. The same goes for UDivL and UDivI intrinsic, too. I believe having target specific parsing is beneficial since we can decompose complex operations into more elemental ones, utilizing the power of the compiler more efficiently.
Delaying the handling till code emission time may miss the opportunities to hoist out the check and in the worst case would result in suboptimal code layout since the compiler can move the uncommon path out of the common path while the assembler can't.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8206
More information about the hotspot-compiler-dev
mailing list