Question/Extension proposal: references to off-heap objects and support for multiple heaps

Thu Jul 26 14:15:51 UTC 2012

Hi,

I have the following question about GC, which is probably a bit unorthodox
...  It is an extension proposal. I tried to find something in the mailing
list archives or on the Internet, but couldn't find anything related.
Therefore I decided to ask on this mailing list, because people here are the
ultimates experts on the JVM's GC mechanisms.

I'm working on an object cache project a-la Terracotta BigMemory, which
makes use of the off-heap storage. The usual approach with such off-heap
solutions is that you first have to serialize objects and then put them as
byte arrays into off-heap memory or direct buffers. This works, but has
quite some drawbacks, e.g. a significant overhead due to serialization and
deserialization, inability to work with off-heap representations as with
usual objects, etc. 

Thinking about these issues, I started wondering, if it would be
(theoretically) possible to allow having objects allocated in the off-heap
memory?  I did some experiments. Right now, it is possible using low-level
java.misc.Unsafe tricks to create an  object with proper headers in the
off-heap memory and refer to it from on-heap objects or stack. You can work
with it as with a normal object without any additional overhead, using
normal operations, e.g. array access, method invocations, access to fields,
etc. But this of course does not work reliably, because as soon as you have
a full GC, the garbage collector detects a reference from a reachable
on-heap object to an address outside of the heap and you start getting JVM
crashes of all kinds.

Based on these observations and experiments, what would be nice to  have is:
1) off-heap objects, which can be referenced from on-heap or on-stack
objects  (and if possible support for creation of such objects at a given
place/address off-heap, i.e. something like explicit placement). It could be
also OK to put some limitations on such off-heap objects, e.g. limit the set
of classes, whose instances could be placed off-heap and referred from
on-heap objects; limits on  what can be referred from off-heap objects;
off-heap object alignment rules, etc. 
2) off-heap objects are pinned/non-movable from GC's point of view - under
no circumstances should GC try to move them around.
3) (optional) off-heap objects, which are allowed to refer to on-heap
objects. If this would be possible, GC should of course scan reachable
off-heap objects to find references to on-heap objects and mark them as
reachable.

But I'm wondering about what is required to achieve at least (1) and (2)? Is
it feasible to do it with not too many changes to HotSpot/GC? At the first
glance, I have the naive impression that one could try to relax the
condition that all references from on-heap objects should refer to an
address inside heap. Instead, reference should refer to an address inside
heap or one of the off-heap memory regions allocated by current application.
One can still check that all the object headers are OK and according to the
JVM rules. And once such a reference to an off-heap object is found, there
is no need to trace/scan the referred off-heap object, because it is known
that such objects cannot refer to on-heap objects. In case, it is required
to support (3) as well, there is a need to scan off-heap objects as well,
which may become tricky. But let's not concentrate on (3) for now.

Questions:
- Was something like this already discussed/considered by JVM developers or
researchers? If so, could you provide links/references to such discussions
and related issues?

- Is such extension as described here technically feasible? Would it really
require just minor changes in HotSpot JVM /GC as I explained or do I miss
something obvious, which would make it very difficult or impossible to
implement. I understand that there is also a "political" dimension of such
an extension, which may result in rejecting it for many of other reasons.
But I'd like to understand a technical feasibility

Generalization of this idea:

Overall, this proposal is just a special case of a more general approach,
which would be to allow multiple (dynamically created/managed) heaps inside
one JVM. Each heap may have its own policy for garbage collection, object
allocation (e.g. any class or only a specific class, explicit placement
support vs automatic address assignment) and constraints regarding which
other heaps can be referenced from a given heap (e.g. only the same heap,
only specific heaps, etc). Obviously, such an approach would require quite
some changes to garbage collection implementation (e.g. checking
cross-references between heaps, probably special read/write barriers, etc). 
It may also require some extensions at the bytecode/language/standard
library level, because it should be possible to allocate objects on a given
heap either on a per-instance or per-class level (this reminds me the C++
class-specific new operators, which can take optional parameters, which in
this case would be a specific heap), move objects/object graphs between
heaps and so on. 

If multiple heaps with their own policies would be supported, it would open
a lot of interesting possibilities:
- non-collectable heaps - useful for JNI, interaction with external
processes, explicit control over memory allocation
- heaps at specific memory regions, which could be very interesting for
embedded systems
- light-weight processes (a-la Erlang) with their own heaps, where such
heaps can be garbage collected independently 
- very fast object caches without big overhead
and many, many more possible applications of such a feature.

Of course, there are also potential drawbacks:
- explicit allocation considered harmful
- more complex garbage collection implementation
- potentially slower garbage collection due to increased complexity

What do you think about this suggestion? Is it possible to implement it
technically in an efficient way by extending current implementation? Is it
possible at all to implement it technically in an efficient way? What would
be the biggest issues to get it working? What would be the implication for
security mechanisms, Java memory model, etc? What could be the biggest
obstacle?

Thanks in advance for any feedback & comments,
   Leo
-- 
View this message in context: http://old.nabble.com/Question-Extension-proposal%3A-references-to-off-heap-objects-and-support-for-multiple-heaps-tp34215852p34215852.html
Sent from the OpenJDK Hotspot Garbage Collection mailing list archive at Nabble.com.