RFR (S) 8146801: Allocating short arrays of non-constant size is slow

Wed Mar 2 17:50:10 UTC 2016

>
> The prefetching assumes that next allocation will be of the same type
> (instance or array). The prefetching is done for future allocation and not
> a current one. So we can't change it based on size of current allocation.
> Yes, it is very simple approach and we can do better by searching other
> allocations in current code. But I doubt it will give us a lot of benefits.

Indeed, it seems kind of an "unprincipled" approach (I do appreciate the
simplicity though).  Unless the workload is doing almost nothing but
allocations (which would have other performance implications), a future
allocation may not come for a while.  If that's the case, the prefetched
line for future allocation will probably not survive in the L1 cache.  I do
hope that prefetchnta, being non-temporal, will not fetch a line into L1 if
that would cause a replacement of an existing line in there;
https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe/topic/356760
appears to imply otherwise, although unfortunately nobody from Intel
responded.

My experiments back then showed that prefetching helps offset zeroing cost
> (in some degree) because cache lines are fetched already. Skipping some
> prefetching may have negative effect.

For zeroing (or user fill) a large array, it makes sense although given the
fill/zeroing is done linearly, h/w prefetch on modern CPUs may already do a
good enough job, if not better.  It'd be interesting to verify this on
modern h/w.

 Memory accesses are more costly then instruction count

Agreed, but software prefetch has nasty habit of either not adding anything
at all or making things worse :).  This case, in particular, seems a bit
odd since the prefetch here is a shot-in-the-dark guess by the compiler.

Thanks for the discussion guys.

On Wed, Mar 2, 2016 at 12:29 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:

> The prefetching assumes that next allocation will be of the same type
> (instance or array). The prefetching is done for future allocation and not
> a current one. So we can't change it based on size of current allocation.
> Yes, it is very simple approach and we can do better by searching other
> allocations in current code. But I doubt it will give us a lot of benefits.
>
> My experiments back then showed that prefetching helps offset zeroing cost
> (in some degree) because cache lines are fetched already. Skipping some
> prefetching may have negative effect. Memory accesses are more costly then
> instruction count.
>
> Thanks,
> Vladimir
>
>
> On 3/2/16 5:05 AM, Vladimir Ivanov wrote:
>
>>   I've no idea whether it would matter in real code.  AFAIK, prefetchnta
>>> will bring the line into L1 on modern Intel.  Prefetching beyond the
>>> small array allocation would seem undesirable as it increases
>>> instruction stream size for no benefit and may bring in lines that
>>> aren't needed at all.
>>>
>> Still guessing, but considering it still prefetches lines from current
>> TLAB, consequent allocations may benefit. Actual
>> performance should heavily depend on allocation rate though.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160302/b575f30c/attachment.html>