RFR (S) 8146801: Allocating short arrays of non-constant size is slow

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Mar 1 17:03:24 UTC 2016


Do you have new performance numbers? I hope it did not regress with new 
code.

2 things left I fill should be addressed.

I think is_large parameter for ClearArrayNode() should be explicit. We 
have only 2 places it is called.

We should avoid to have copies of rep_fast_stos() in .ad files.

The difference in .ad depending on UseFastStosb is format only. We now 
have ability to generate format with conditions - see membar_volatile() 
and others (search %%template).
There is also way to add and set new field in rep_fast_stos to check 
is_large in ins_encode but it is more complicated (adlc changes). So I 
would keep predicate for that.

Thanks,
Vladimir

On 3/1/16 4:47 AM, Aleksey Shipilev wrote:
> Hi Vladimir!
>
> New webrev:
>   http://cr.openjdk.java.net/~shade/8146801/webrev.05/
>
> On 03/01/2016 04:01 AM, Vladimir Kozlov wrote:
>> I was thinking may be we should do it in Ideal graph instead of
>> assembler. But it could trigger Fill array or split iterations
>> optimizations which may not good for such small arrays.
>
> Yes, I thinking about that too, but backed off thinking this is a
> x86-specific optimization. But then again, looking at ClearArray::Ideal
> that happens for all platforms, we should try to emit the runtime check
> there too. Let me see if we can pull that off without messing up the
> generated code. Meanwhile, the webrev above is the assembler-side variant.
>
>> Did CPU, you tested on, supports ERMS (fast stos)?
>
> Yes, tested on i7-4790K (Haswell), /proc/cpuinfo reports "erms", and we
> are going through UseFastStosb branch.
>
>> clear_mem() is used only in .ad which is only C2. You can put it under
>> #ifdef COMPILER2 and you can access Matcher::init_array_short_size then.
>
> Ah, good trick. Still, I think exposing this as the true
> platform-dependent flag is better. It also closely follows what
> ClearArray::Ideal does.
>
>> Why x86_32.ad does not have similar changes?
>
> Hm, I thought I uploaded the webrev with those changes too, but now I
> see it is missing. A new webrev has x86_32 parts.
>
>> Do we really should care for old CPUs (UseFastStosb == false)?
>
> I think we should care, in the same way we care in ClearArray::Ideal.
>
>> Use short branch instructions jccb and jmpb!!!!
>
>> movptr(Address(base, cnt, Address::times_ptr), 0); is too big. You have
>> RAX for that.
>
>> Labels declaration (except DONE) and bind(LONG); should be inside if
>> (!is_large) { since it is only used there.
>
> *embarrased* All three fixed in new webrev.
>
>> You have too many jumps per code. I would suggest next:
>
> The variant of your code is in new webrev (there were two minor bugs:
> shlptr was too late for 32-bit VMs, and no jmpb(DONE)).
>
>
>
> Thanks,
> -Aleksey
>


More information about the hotspot-compiler-dev mailing list