Question/Extension proposal: references to off-heap objects and support for multiple heaps

Leo Romanoff romixlev at yahoo.com
Fri Jul 27 14:07:38 UTC 2012


Hi Thomas,

Thanks a lot for this very elaborate answer with deep insights into MVM. I
was not aware of this project. I looked at your paper and other papers about
MVM and it seems to be very interesting. But of course, it is also a way
more ambitious undertaking than what I proposed. MVM aims for isolation of
Java applications, whereas I just proposed off-heap objects and multiple
heaps, but belonging to the same Java app. 

(BTW, one thing I'm wondering about is the current status of MVM. Is Oracle
really interested in it? Are there any chances that it becomes a product?
These days there is a lot of research and products based on OS
virtualization and virtualization of whole computers. The whole cloud
computing boom is related to it. So, the interesting pragmatic question is:
If you need isolation, is it simpler to virtualize the whole OS with JVM
running on it or is it still worth to virtualize JVM  along the lines of the
MVM project?) 

Another thing that I realized only after reading your answer and answers
about the Taobao GCIH and which was not clear for me when I did a proposal:
the need to take care of references between off-heap objects (or objects on
different heaps) and meta information about classes. Now I understand that
in a general setting it is required, because classes can be unloaded,
reloaded, meta-info storage can be compacted, etc. 

At the same time, I have the feeling the some of the approaches described in
this thread so far are not so widely spread because they probably aim too
high. They try to solve problems in a most general and flexible way and this
is often too difficult. From a pragmatic point of view, I'm wondering if
doing small easy improvements already today is probably more beneficial than
trying for years to produce a generic grand-architecture that would be able
to solve all problems... Sometimes perfectionism (going for revolution) can
hinder a steady evolution ;-) 

For example, from a purely pragmatic point of view I see the following very
small improvements which can bring a lot of benefits already today:

- Add support for (1) and (2) from my original approach, i.e. allow objects
in off-heap memory and a few related things.

- If it is problematic to have references from those off-heap objects to
on-heap objects or to classes metainformation, simply restrict or prohibit
it. After all, today people store only byte buffers off-heap. So, even if
only one class of objects, namely byte arrays (byte[]) or arrays of built-in
types, would be allowed to be placed off-heap, this would already result in
a lot of performance benefits. This would eliminate a need to
serialize/write into byte[] and then copy into off-heap memory, because it
would make it possible to efficiently serialize/write into off-heap byte
byte[] directly (I expect at least 2 times better performance for code that
can make use of it). BTW, another nice thing about arrays of builtin types
is that they never refer to any on-heap objects. They only refer to class
metainformation related to arrays of builtin types. And this metainformation
is never freed anyway, I guess.

- Later (and probably with some limitation) allow for allocation of
instances for a broader set of classes. The limitation could be e.g. that
classes cannot have fields of non-builtin types or that they can refer only
to objects of classes that are also allocated off-heap. Or if it is not too
difficult to allow references from off-heap objects to on-heap objects, then
remove a corresponding restriction.

When it comes to multi-heaps, I realize that it is much more difficult to
achieve, as previous experience has shown. At the same time, the
implementation of multi-heaps only, without isolation like in MVM, without
other RTJS aspects and so on, could be still easier than implementing those
projects in all their complexity.


Thomas Schatzl-2 wrote:
> 
>> Of course, there are also potential drawbacks:
>> - explicit allocation considered harmful
> 
> Not sure what you mean here. Depends on how this "explicit" allocation
> is defined.
> The VM users (programmer, administrators, ...) know much more a priori
> about the application and allocation behavior than the memory manager so
> it may be prudent to give them opportunities to provide them.
> 

I totally agree with you. I was a bit sarcastic here ;-) What I meant was a
general approach in Java/JVM to use GC instead of explicit memory management
as we know it from C/C++/etc. There are still many people around with almost
religious views on using or not using GC for memory management, depending on
who you ask. So, in this sense this is similar to the views and discussions
about the classical "Goto conisdered harmful" statement.

-Leo


Thomas Schatzl-2 wrote:
> 
> Hi,
> 
> On Thu, 2012-07-26 at 07:15 -0700, Leo Romanoff wrote:
>> Hi,
>> 
>> I'm working on an object cache project a-la Terracotta BigMemory, which
>> makes use of the off-heap storage. The
>>[...]
>> Questions:
>> - Is such extension as described here technically feasible? Would it
>> really
>> require just minor changes in HotSpot JVM /GC as I explained or do I miss
>> something obvious, which would make it very difficult or impossible to
>> implement. I understand that there is also a "political" dimension of
>> such
>> an extension, which may result in rejecting it for many of other reasons.
>> But I'd like to understand a technical feasibility.
> 
> Very minor compared to your other idea I will discuss in more detail
> below. G1 already provides lots of needed infrastructure.
> The changes likely won't be limited to that single CR though (it seems
> Oracle internal), but seem manageable.
> 
>> Generalization of this idea:
>> 
>> Overall, this proposal is just a special case of a more general approach,
>> which would be to allow multiple (dynamically created/managed) heaps
>> inside
>> one JVM. Each heap may have its own policy for garbage collection, object
>> allocation (e.g. any class or only a specific class, explicit placement
>> support vs automatic address assignment) and constraints regarding which
>> other heaps can be referenced from a given heap (e.g. only the same heap,
>> only specific heaps, etc). Obviously, such an approach would require
>> quite
>> some changes to garbage collection implementation (e.g. checking
>> cross-references between heaps, probably special read/write barriers,
>> etc). 
> 
> There is (a currently dormant) project from Sun Labs/Oracle Labs. The
> link http://labs.oracle.com/projects/barcelona/ provides an overview of
> related publications up to 2005.
> They describe at least some of the issues with garbage collection in a
> multi-heap environment nicely.
> Also try searching using "Multi-tasking Virtual Machine" or "MVM" as
> keywords.
> 
> It has been picked up from around 2008 by Sun/Oracle Labs again  to
> overcome most of the gc related problems. One fairly recent paper ([1]),
> while primarily discussing the performance benefits of the permanent
> generation removal, also contains a few paragraphs about the current
> state of the MVM since it has been used as basis for the experiments.
> There is no public code available for it.
> 
> Not sure to what extent the Java real-time implementations fit your
> description or the use case.
> 
> Another non-Hotspot related effort (in [2]) provides a similar, less
> advanced, system. I believe it's in the field already (Mozilla Firefox).
> I am sure the other big remaining VM vendors have similar systems (.NET
> VM application domains?).
> 
>> It may also require some extensions at the bytecode/language/standard
>> library level, because it should be possible to allocate objects on a
>> given
>> heap either on a per-instance or per-class level (this reminds me the C++
>> class-specific new operators, which can take optional parameters, which
>> in
>> this case would be a specific heap), move objects/object graphs between
>> heaps and so on. 
> 
> Not much if anything has been done in that direction afaik. There are
> some JSRs in that direction (jsr 121, jsr 284 and the mentioned
> real-time specification).
> The age of these jsrs and the apparent lack of mainstream
> implementations, and the state of that mentioned research project after
> 12 years of development indicate that there is too little real interest.
> 
>> If multiple heaps with their own policies would be supported, it would
>> open
>> a lot of interesting possibilities:
>>
>>[...]
>>
>> Of course, there are also potential drawbacks:
>> - explicit allocation considered harmful
> 
> Not sure what you mean here. Depends on how this "explicit" allocation
> is defined.
> The VM users (programmer, administrators, ...) know much more a priori
> about the application and allocation behavior than the memory manager so
> it may be prudent to give them opportunities to provide them.
> 
> Of course there is the possibility that the memory manager learns and
> optimizes memory layout over time, but there is the constraint that this
> detection should not have any overhead in time and space. Additionally
> in many applications this kind of behavior is very transient (there are
> some exceptions like in your use case, but that one is probably the most
> simple) it may be more beneficial to simply provide hints to the VM for
> the general case.
> 
>> - more complex garbage collection implementation
>> - potentially slower garbage collection due to increased complexity
>> 
>> What do you think about this suggestion? Is it possible to implement it
>> technically in an efficient way by extending current implementation? Is
>> it
> 
> Definitely yes. Note that such an effort would require extensive
> touching of many parts of the current Hotspot VM.
> 
>> possible at all to implement it technically in an efficient way? What
>> would
> 
> Again, yes. There is no technical reason why such a system could not
> work efficiently (say, within +-5% of performance of existing
> collectors) in comparable settings.
> 
>> be the biggest issues to get it working? What would be the implication
>> for
> 
> Everything about application isolation regarding memory management and
> other VM areas, for some of them see the given literature.
> 
>> security mechanisms, Java memory model, etc? What could be the biggest
> 
> I do not think these pose real problems.
> 
>> obstacle?
> 
> Further, any efforts to try to standardize this functionality.
> 
> Thomas
> 
> [1] Thomas Schatzl, Laurent Daynès, and Hanspeter Mössenböck. 2011.
> Optimized memory management for class metadata in a JVM. In Proceedings
> of the 9th International Conference on Principles and Practice of
> Programming in Java (PPPJ '11). ACM, New York, NY, USA, 151-160.
> DOI=10.1145/2093157.2093182 http://doi.acm.org/10.1145/2093157.2093182
> [2] Gregor Wagner, Andreas Gal, Christian Wimmer, Brendan Eich, and
> Michael Franz. 2011. Compartmental memory management in a modern web
> browser. SIGPLAN Not. 46, 11 (June 2011), 119-128.
> DOI=10.1145/2076022.1993496
> 
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Question-Extension-proposal%3A-references-to-off-heap-objects-and-support-for-multiple-heaps-tp34215852p34220673.html
Sent from the OpenJDK Hotspot Garbage Collection mailing list archive at Nabble.com.




More information about the hotspot-gc-dev mailing list