Fwd: Re: Fwd: Improve large array allocation / gc & intrinsics

Thu Feb 13 00:07:11 PST 2014

Transfer to hotspot-dev list.

Anybody else has an opinion related to array wastage / improvements ?

Ps: using misc.Unsafe, it seems possible to allocate memory chunks and
reallocate ie expand block without losing data.

It may be possible to use such feature to provide more efficient growable
arrays but it becomes unsafe and no [index] operator !

Using unsafe, it seems also possible to do "pointer arithmetics" ie
accessing a matrix either as a 1d array or 2d or n dimensions.

I could try making a prototype or maybe someone already evaluated such
design and performed benchmarks ?

Laurent

---------- Message transféré ----------
De : "Laurent Bourgès" <bourges.laurent at gmail.com>
Date : 11 févr. 2014 17:41
Objet : Re: Fwd: Improve large array allocation / gc & intrinsics
À : "Thomas Schatzl" <thomas.schatzl at oracle.com>
Cc :

> Thomas,
> Thanks a lot for your point of view and comments.
>
> Few more explanations:
>
> Two use cases:
> - growable arrays like stringbuilder or int arrays in java2d pisces: the
final site can not be estimated because it depends on application data, use
cases or api missing features. In such cases arrays are created with an
initial size. When it is not enough, a new larger array is created (zero
filled) and data copied into (system.arraycopy). As you know, it leads to
both memory waste depending on initial size and the growing factor and cpu
due to array copies. There are many such use cases in jdk (collections,
java2d...)
> - image processing or science applications: arrays are created when
needed like temporary objects so a lot of waste is created. For example, an
image processing pipeline may create 1 image per stage instead of using an
image pool .... or in java2d many arrays are created for each shape to
render so it produces hundred megabytes that gc have to prune...
>
> It is also quite common to transform array dimensions: n dims <=> 1d. To
do it, new arrays are created and data copied. A transformation api could
help... but dealing directly on memory (memcpy) or direct access to the
memory chunk like unsafe can.
>
> > > > I would like to propose a JDK9 RFE to improve JVM efficiency when
> > > > dealing with large arrays (allocation + gc).
> > > >
> > > > In several scientific applications (and my patched java2d pisces),
> > > > many large arrays are needed, created on the fly and it becomes very
> > > > painful to recycle them using an efficient array cache (concurrency,
> > > > cache size tuning, clear + cache eviction issues).
> >
> > Why do you think that a one-size fits all approach that any library in
> > the JDK will not have the same issues to deal with? How do you know that
> > a generic library in the JDK (as in your proposal) or hacking things
> > into the VM can deal with this problem better than you who knows the
> > application allocation patterns?
>
> As java developpers often do not care about memory allocation/gc, I think
it is more efficient to handle efficiently arrays at the jdk level that
will benefit to all applications.
>
> Alternatively a new API could be enough: GrowableArray and array
transformer (dimension converter)...
>
> >
> > Do you have a prototype (library) for that?
>
> Not really but I patched pisces to be more memory efficient: threadlocal
context + array cache: https://github.com/bourgesl/marlin-renderer
>
> FYI I will try using off heap arrays to be even more efficient
>
> >
> > > > In this case, the GC overhead leads to a big performance penalty
> > > > (several hundred megabytes per seconds).
> >
> > I do not understand what the problem is. Typically I would not specify
> > performance (throughput?) penalty in megabytes per seconds.
>
> I can give you numbers but I encountered very big slowdown due to
growable arrays...
>
> >
> > Do the GCs take too long, or do you feel there too much memory wastage
> > somewhere?
>
> Too many waste and array copies.
>
> >
> > > > As such array cache are very efficient in an application context, I
am
> > > > wondering if that approach could be promoted at the JDK level
itself:
> > > >
> > > > - provide a new array allocator for large arrays that can return
> > > > larger arrays than expected (size = 4 or 8 multiples) using array
> > > > caches (per thread ?) stored in a dedicated JVM large memory area
> >
> > The behavior you propose seems very particular to your application.
> > Others may have completely different requirements. The mentioned
> > per-thread caches do not seem to be problematic to do in a library
> > either.
>
> Of course but at jdk level it can boost any java application, not only
mines !
>
> > > > (GC aware) and providing efficient cache eviction policies
> >
> > Did you every try one of the other available garbage collectors with
> > your application? Both CMS and G1 never move large objects around, i.e.
> > there is almost no direct GC overhead associated with them.
>
> I am using CMS for my multithreaded benchmarks. However as there is too
much waste, the gc overhead comes from heap traversal and scanning for live
refs.
>
> >
> > Reclaiming them is almost zero cost in these collectors. Keeping them
> > alive seems to be best handled by logic in a library.
> >
> > Can you give examples where the VM has significant advantages over a
> > dedicated library, or a particular use case with measurements that show
> > this could be the case?
>
> To be investigated... I have the feeling that a c like realloc + copy
could be efficient for growable arrays...
>
> > > > - may support for both clean arrays (zero filled) or dirty arrays
> > > > because some algorithms does not need zero-filled arrays.
> > > >
> > > > - improve JVM intrinsics (array clear, fill) to achieve maximum
> > > > performance ie take into account the data alignment (4 or 8
multiples)
> >
> > I think the compiler already uses specialized methods for that, using
> > the "best" instructions that are available. It should also already be
> > able to detect "fill" loops, and vectorize them.
>
> I think so but maybe new low level features could help.
>
> > Objects are always 8 byte aligned - I think you can force higher
> > alignments by setting ObjectAlignmentInBytes or so.
> Excellent.
>
> >
> > Otherwise these changes could be simply added, i.e. seems to not need
> > any RFE.
> Great.
>
> > > > - upgrade GC to recycle such 'cached' arrays (clean), update usage
> > > > statistics and manage cache eviction policy (avoid wasting memory)
> >
> > The GCs already automatically recycle the freed space. Everything else
> > seems to be more complicated to implement at VM level than at library
> > level, with the added drawback that you add a VM dependency.
>
> Ok maybe at jdk level it will be great.
>
> Thanks again,
> Laurent