[aarch64-port-dev ] RFR: 8151502: aarch64: optimize pd_disjoint_words and pd_conjoint_words
Andrew Haley
aph at redhat.com
Wed Mar 9 14:11:19 UTC 2016
On 03/09/2016 02:07 PM, Edward Nevill wrote:
> On Wed, 2016-03-09 at 12:57 +0000, Andrew Haley wrote:
>> On 03/09/2016 12:17 PM, Edward Nevill wrote:
>>> http://cr.openjdk.java.net/~enevill/8151502/JMHSample_97_GCStress.java
>>>
>>> JMH jar file: http://cr.openjdk.java.net/~enevill/8151502/benchmarks.jar
>>>
>>> The following are the results I get
>>
>> Not bad, but not quite perfect. But I guess you knew I'd say that.
>> :-)
>>
>> The switch on count < threshold should be done in C, with multiple
>> inline asm blocks. That way, GCC can do value range propagation for
>> small copies.
>
> Hmm. I did try using switch on my first stab at this but gave up
> when I got the following output for this simple test program (this
> is with stock gcc 5.2).
I was just thinking of if (cnt < 8) small() else large();
>> void bletch() {
>> Copy::disjoint_words(blah, blah2, sizeof blah / sizeof blah[0]);
>> }
>>
>> Finally, GCC has __builtin_expect(bool). We should use that to emit
>> the large copy and the backwards copy out of line.
>
> I did initially do the large copy out of line. My concern was that
> the register allocator wouldn't handle the two paths and would treat
> x0..x18 as corrupted on both paths, whereas the inline version
> 'only' uses 11 registers.
You can do the pushing and popping yourself.
Andrew.
More information about the aarch64-port-dev
mailing list