NUMA-Aware Java Heaps for in-memory databases

Thu Feb 21 02:58:38 PST 2013

Thanks for the feedback. The "Dealing with JVM Limitations in Apache
Cassandra" presentation was indeed captivating and I felt at home. That
being said their engine is disk based, they are "caching" the disk
in-memory but design for disk: they serialize everything. An in-memory
database like ActivePivot maintains the entire data in-memory, with
compression, indexes and data layouts that are best suited for memory
access. That also offers the opportunity to hold object based data and
aggregates, with custom aggregation functions. I can't stress enough how
important it is to our customers that inject their business logic in the
core database engine (and love to do it in Java by the way).

Last week I had lunch with Gil Tene, CTO and co-founder of Azul Systems,
the writers of the Zing JVM (he knows a thing or two about GC and memory
layouts ;). I was telling him my concerns about NUMA, that NUMA machines
would become more and more common, and we should adapt our structures and
algorithms for NUMA like we did adapt to multicore (we did didn't we?). But
he does not share my position. To him the performance issues associated
with NUMA are transient, going to be fixed or minimized by chip makers. He
also reckons that NUMA optimization for the general case is VERY hard
(impossible?) and that it should not be the burden of application writers.

If people share his view, and if there is already this tradition of using
off-heap data structures in Java databases, I understand why nobody is
looking hard at making the old generation NUMA Aware.

For ActivePivot I think we will prototype the usage of off-heap direct
buffers to hold primitive data. Object data we'll leave in the heap as
serialization is prohibitive. That may be good enough knowing that in
typical projects primitive data is dominating. Then we'll buy one of the
early Intel Xeon Haswell servers to see if progress is being made.

-- 
Antoine CHAMBILLE
Director Research & Development
Quartet FS

On 20 February 2013 15:54, Remi Forax <forax at univ-mlv.fr> wrote:

> On 02/20/2013 02:28 PM, Volker Simonis wrote:
>
>> The limitations you describe are the reason why databases implemented
>> in Java usually use external memory (to a greater or lesser extent) to
>> work around (mostly performance) problems of Java heap memory. There
>> was a very interesting talk at JavaOne about how these problems are
>> solved in Cassandra: "Dealing with JVM Limitations in Apache
>> Cassandra" (https://oracleus.**activeevents.com/connect/**
>> sessionDetail.ww?SESSION_ID=**3586<https://oracleus.activeevents.com/connect/sessionDetail.ww?SESSION_ID=3586>
>> )
>>
>> Regards,
>> Volker
>>
>
> but as far as I know, Cassandra only stores primitive values no object.
>
> cheers,
> Rémi
>
>
>
>>
>> On Wed, Feb 13, 2013 at 2:42 PM, Antoine Chambille <ach at quartetfs.com>
>> wrote:
>>
>>> We are developing a Java in-memory analytical database (it's called
>>> "ActivePivot") that our customers deploy on ever larger datasets. Some
>>> ActivePivot instances are deployed on java heaps close to 1TB, on NUMA
>>> servers (typically 4 Xeon processors and 4 NUMA nodes). This is becoming
>>> a
>>> trend, and we are researching solutions to improve our performance on
>>> NUMA
>>> configurations.
>>>
>>>
>>> We understand that in the current state of things (and including JDK8)
>>> the
>>> support for NUMA in hotspot is the following:
>>> * The young generation heap layout can be NUMA-Aware (partitioned per
>>> NUMA
>>> node, objects allocated in the same node than the running thread)
>>> * The old generation heap layout is not optimized for NUMA (at best the
>>> old
>>> generation is interleaved among nodes which at least makes memory
>>> accesses
>>> somewhat uniform)
>>> * The parallel garbage collector is NUMA optimized, the GC threads
>>> focusing
>>> on objects in their node.
>>>
>>>
>>> Yet activating -XX:+UseNUMA option has almost no impact on the
>>> performance
>>> of our in-memory database. It is not surprising, the pattern for a
>>> database
>>> is to load the data in the memory and then make queries on it. The data
>>> goes and stays in the old generation, and it is read from there by
>>> queries.
>>> Most memory accesses are in the old gen and most of those are not local.
>>>
>>> I guess there is a reason hotspot does not yet optimize the old
>>> generation
>>> for NUMA. It must be very difficult to do it in the general case, when
>>> you
>>> have no idea what thread from what node will read data and interleaving
>>> is.
>>> But for an in-memory database this is frustrating because we know very
>>> well
>>> which threads will access which piece of data. At least in ActivePivot
>>> data
>>> structures are partitioned, partitions are each assigned a thread pool so
>>> the threads that allocated the data in a partition are also the threads
>>> that perform sub-queries on that partition. We are a few lines of code
>>> away
>>> from binding thread pools to NUMA nodes, and if the garbage collector
>>> would
>>> leave objects promoted to the old generation on their original NUMA node
>>> memory accesses would be close to optimal.
>>>
>>> We have not been able to do that. But that being said I read an inspiring
>>> 2005 article from Mustafa M. Tikir and Jeffrey K. Hollingsworth that did
>>> experiment on NUMA layouts for the old generation. ("NUMA-aware Java
>>> heaps
>>> for server applications"
>>> http://citeseerx.ist.psu.edu/**viewdoc/download?doi=10.1.1.**
>>> 92.6587&rep=rep1&type=pdf<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.6587&rep=rep1&type=pdf>
>>> ).
>>> That motivated me to ask questions in this mailing list:
>>>
>>>
>>> * Are there hidden or experimental hotspot options that allow NUMA-Aware
>>> partitioning of the old generation?
>>> * Do you know why there isn't much (visible, generally available)
>>> progress
>>> on NUMA optimizations for the old gen? Is the Java in-memory database use
>>> case considered a rare one?
>>> * Maybe we at Quartet FS should experiment and even contribute new heap
>>> layouts to the open-jdk project. Can you comment on the difficulty of
>>> that?
>>>
>>>
>>> Thanks for reading, and Best Regards,
>>>
>>> --
>>> Antoine CHAMBILLE
>>> Director Research & Development
>>> Quartet FS
>>>
>>
>