RFC: Experiment in accessing/managing persistent memory from Java

Tue Jun 12 17:12:39 UTC 2018

Hi Jonathan,

> On Jun 8, 2018, at 3:59 AM, Jonathan Halliday <jonathan.halliday at redhat.com> wrote:
> 
> 
> Hi Paul
> 
> Looks like we're all on the same page regarding the basic approach of using a small API and making the critical bits intrinsic. We perhaps have some way to go on exactly what that API looks like in terms of the classes and methods, but iterating on it by discussion of a JEP seems like the best way forward. The important thing from my perspective is that so far nobody has come forward with a use case that is not covered by the proposed primitives. So it's a small API, but not too small.

Yes, a smallish API we can iterate on.

> 
> As far a tweaks go, we have considered making the low level primitive method / intrinsic just a flushCacheline(base_address), since the arithmetic and loop for writing flush(from,to) in terms of that low level op is something that can be optimized fine by the JIT already. Though that does mean exposing the cache line size to the Java layer, whereas currently it's only visible in the C code.
> 

That’s ok. The simpler the intrinsics while relying on Java code + JIT would be my generally preferred pattern.

> 
> My own background and focus is transactions systems, so I'm more about the speed and fault tolerance than the capacity, but I can see long vs. int being of interest to our Infinispan data grid team and likewise for e.g. Oracle Coherence or databases like Cassandra. OTOH it's not uncommon to prefer moderately sized files and shard over them, which sidesteps the issue.
> 

Ok, which is conveniently how developers currently work around the issue of mapping large files :-)

> Utility code to assist with fine-grained memory management within the buffer/file may be more useful than support for really large buffers, since they tend to be used with some form of internal block/heap structure anyhow, rather than to hold very large objects. Providing that may be the role of a 3rd party pure Java library like PCJ though, rather than something we want in the JDK itself at this early stage. The researcher in me is kinda interested in how much of the memory allocation and GC code can be re-purposed here though...
> 
> What's the intended timeline on long buffer indexing at present?

Unsure, but it's probably something we want to solve soonish.

> My feeling is a pmem API JEP is probably targeting around JDK 13, but we're flexible on that.

Note that the JEP process can be started before then and JEPs are not targeted to a release until ready, if its ready sooner great! otherwise later. Keeping such a JEP focused on the mapping/flushing of BBs for NVM would be my recommendation rather than expanding its scope.

>  We may also want to look at related enhancements like unmapping buffers. I think those pieces are sufficient decoupled that they won't be dependencies for the pmem API though, unlike other factors such as the availability of test hardware.
> 

That’s tricky! We have been through many discussions over the years on how to achieve this without much resolution. Andrew Haley came up with an interesting solution which IIRC requires the deallocating/unmapping thread to effectively reach safe point and wait for all other threads to pass through a check point. Project Panama is looking at the explicit scoping of resources, perhaps also resources that are thread confined or owned. My sense is Project Panama will eventually push strongly on this area and that’s where we should focus our efforts.

Thanks,
Paul.