Fwd: Improve large array allocation / gc & intrinsics

Tue Feb 11 02:28:30 PST 2014

Hi,

  just a few comments...

> De : "Laurent Bourgès" <bourges.laurent at gmail.com>
> Date : 10 févr. 2014 10:24
> Objet : Improve large array allocation / gc & intrinsics
> À : "core-libs-dev" <core-libs-dev at openjdk.java.net>, "discuss" <
> discuss at openjdk.java.net>
> Cc :
> 
> > Dear all,
> >
> > I would like to propose a JDK9 RFE to improve JVM efficiency when
> > dealing with large arrays (allocation + gc).
> >
> > In several scientific applications (and my patched java2d pisces),
> > many large arrays are needed, created on the fly and it becomes very
> > painful to recycle them using an efficient array cache (concurrency,
> > cache size tuning, clear + cache eviction issues).

Why do you think that a one-size fits all approach that any library in
the JDK will not have the same issues to deal with? How do you know that
a generic library in the JDK (as in your proposal) or hacking things
into the VM can deal with this problem better than you who knows the
application allocation patterns?

Do you have a prototype (library) for that?

> > In this case, the GC overhead leads to a big performance penalty
> > (several hundred megabytes per seconds).

I do not understand what the problem is. Typically I would not specify
performance (throughput?) penalty in megabytes per seconds. 

Do the GCs take too long, or do you feel there too much memory wastage
somewhere?

> > As such array cache are very efficient in an application context, I am
> > wondering if that approach could be promoted at the JDK level itself:
> >
> > - provide a new array allocator for large arrays that can return
> > larger arrays than expected (size = 4 or 8 multiples) using array
> > caches (per thread ?) stored in a dedicated JVM large memory area 

The behavior you propose seems very particular to your application.
Others may have completely different requirements. The mentioned
per-thread caches do not seem to be problematic to do in a library
either.

> > (GC aware) and providing efficient cache eviction policies

Did you every try one of the other available garbage collectors with
your application? Both CMS and G1 never move large objects around, i.e.
there is almost no direct GC overhead associated with them.

Reclaiming them is almost zero cost in these collectors. Keeping them
alive seems to be best handled by logic in a library.

Can you give examples where the VM has significant advantages over a
dedicated library, or a particular use case with measurements that show
this could be the case?

> > - may support for both clean arrays (zero filled) or dirty arrays
> > because some algorithms does not need zero-filled arrays.
> >
> > - improve JVM intrinsics (array clear, fill) to achieve maximum
> > performance ie take into account the data alignment (4 or 8 multiples)

I think the compiler already uses specialized methods for that, using
the "best" instructions that are available. It should also already be
able to detect "fill" loops, and vectorize them.

Objects are always 8 byte aligned - I think you can force higher
alignments by setting ObjectAlignmentInBytes or so.

Otherwise these changes could be simply added, i.e. seems to not need
any RFE.

> > - upgrade GC to recycle such 'cached' arrays (clean), update usage
> > statistics and manage cache eviction policy (avoid wasting memory)

The GCs already automatically recycle the freed space. Everything else
seems to be more complicated to implement at VM level than at library
level, with the added drawback that you add a VM dependency.

> > Please give me your feedback & opinion and evaluate if this feature
> > seems possible to implement into the JDK (hotspot, gc, core-libs)...

Thanks,
Thomas