hotspot heap and L1 and L2 cache misses

Wed Sep 26 10:04:55 PDT 2012

Andy,

TLAB stands for Thread-Local Allocation Buffer :-)
So the answer for your question is: definitely yes.

There's nice article [1] about TLAB sizing on the web.

Best regards,
Vladimir Ivanov

[1]
https://blogs.oracle.com/daviddetlefs/entry/tlab_sizing_an_annoying_little

On 09/26/12 20:39, Andy Nuss wrote:
> I tested TLAB allocations in single threaded microbenchmark, and when no
> GC was involved, it seems like it was about 5 nanos overhead to create a
> small object.  That is plenty fast enough.
> 
> However, now I'm wondering about my chained objects.  My long running
> execution function unlinks and relinks many types of chains.  The
> question is, how strong is the guarantee of co-location with a thread,
> i.e. when many Java threads are calling this execution function that
> iteratively creates small objects per thread.  (NOTE: simultaneous calls
> of the execution function do not share objects in any way).  I.e. is
> TLAB a threadlocal approach that uses a reasonable sized block of known
> free memory for each thread?
> 
> ------------------------------------------------------------------------
> *From:* Christian Thalinger <christian.thalinger at oracle.com>
> *To:* Andy Nuss <andrew_nuss at yahoo.com>
> *Cc:* hotspot <hotspot-compiler-dev at openjdk.java.net>
> *Sent:* Monday, September 17, 2012 11:39 AM
> *Subject:* Re: hotspot heap and L1 and L2 cache misses
> 
> 
> On Sep 15, 2012, at 12:03 PM, Andy Nuss <andrew_nuss at yahoo.com
> <mailto:andrew_nuss at yahoo.com>> wrote:
> 
>> Hi,
>>
>> Lets say I have a function which mutates a finite automata.  It
> creates lots of small objects (my own link and double-link structures). 
> It also does a lot of puts in my own maps.  The objects and maps in turn
> have references to arrays and some immutable objects.
>>
>> My question is, all these arrays and objects created in one function
> that has to do a ton of construction, are there any things to watchout
> for so that hotspot will try to create all the objects in this one
> function/thread colocated on the heap so that L1/L2 cache misses are
> reduced when the finite automata is executed against data?
>>
>> Ideally, someone could tell me that when my class constructors in turn
> creates new instances of other various size other objects and arrays,
> they are all colocated on the heap.
>>
>> Ideally, someone could tell me that when I have a looping function
> that creates alot of very small Linked List objects in succession, again
> they are colocated.
>>
>> In general, how does hotspot try with creating new objects to help the
> L1/L2 caches?
>>
>> By the way, I did a test port of my automata to C++ where for objects
> like the above, I had big memory chunks that my inplace constructors
> just subdivided the memory chunk that it owned so that all the
> subobjects were absolutely as colocated as possible.
>>
>> This C++ ported automata out-performed my java version by 5x in
> execution against data.  And in cases where I tested the performance of
> construction-time cost of the automata where the comparison is between
> the hotspot new, versus my simple inplace C++ member functions which
> basically just return the current chunk cursor, after calculating the
> size of the object, and updating the chunk cursor to point beyond the
> new size, in those cases I saw 25x performance differences (5 yrs ago).
> 
> TLAB allocations do the same pointer-bump in HotSpot.  Do the 5x really
> come from co-located data?  Did you measure it?  And maybe you should
> redo your 25x experiment.  5 years is a long time...
> 
> -- Chris
> 
>>
>> Andy
> 
> 
>