hotspot heap and L1 and L2 cache misses
Christian Thalinger
christian.thalinger at oracle.com
Mon Sep 17 12:49:48 PDT 2012
On Sep 17, 2012, at 12:20 PM, Andy Nuss <andrew_nuss at yahoo.com> wrote:
> What about the case of a new class instance that creates and holds 2 references to medium length arrays? Is the new instance and its 2 arrays in the same area of the heap?
Depends on what you mean with "the same area". But these questions should better go to hotspot-gc-dev.
-- Chris
>
> From: Christian Thalinger <christian.thalinger at oracle.com>
> To: Andy Nuss <andrew_nuss at yahoo.com>
> Cc: hotspot <hotspot-compiler-dev at openjdk.java.net>
> Sent: Monday, September 17, 2012 11:39 AM
> Subject: Re: hotspot heap and L1 and L2 cache misses
>
>
> On Sep 15, 2012, at 12:03 PM, Andy Nuss <andrew_nuss at yahoo.com> wrote:
>
> > Hi,
> >
> > Lets say I have a function which mutates a finite automata. It creates lots of small objects (my own link and double-link structures). It also does a lot of puts in my own maps. The objects and maps in turn have references to arrays and some immutable objects.
> >
> > My question is, all these arrays and objects created in one function that has to do a ton of construction, are there any things to watchout for so that hotspot will try to create all the objects in this one function/thread colocated on the heap so that L1/L2 cache misses are reduced when the finite automata is executed against data?
> >
> > Ideally, someone could tell me that when my class constructors in turn creates new instances of other various size other objects and arrays, they are all colocated on the heap.
> >
> > Ideally, someone could tell me that when I have a looping function that creates alot of very small Linked List objects in succession, again they are colocated.
> >
> > In general, how does hotspot try with creating new objects to help the L1/L2 caches?
> >
> > By the way, I did a test port of my automata to C++ where for objects like the above, I had big memory chunks that my inplace constructors just subdivided the memory chunk that it owned so that all the subobjects were absolutely as colocated as possible.
> >
> > This C++ ported automata out-performed my java version by 5x in execution against data. And in cases where I tested the performance of construction-time cost of the automata where the comparison is between the hotspot new, versus my simple inplace C++ member functions which basically just return the current chunk cursor, after calculating the size of the object, and updating the chunk cursor to point beyond the new size, in those cases I saw 25x performance differences (5 yrs ago).
>
> TLAB allocations do the same pointer-bump in HotSpot. Do the 5x really come from co-located data? Did you measure it? And maybe you should redo your 25x experiment. 5 years is a long time...
>
> -- Chris
>
> >
> > Andy
>
>
>
More information about the hotspot-compiler-dev
mailing list