RFR(M): 8171181: Supporting heap allocation on alternative memory devices

Kharbas, Kishor kishor.kharbas at intel.com
Wed Mar 1 03:12:27 UTC 2017


Thanks Thomas for providing a different perspective. It will definitely make the implementation simpler and flexible.
I will work on this approach; hopefully there aren’t unforeseen hurdles.

I agree with the other comment, I should fix that.
I have explicitly set a mask as a precaution, since older implementations set permission to 0666.

Thanks
Kishor

From: Thomas Stüfe [mailto:thomas.stuefe at gmail.com]
Sent: Tuesday, February 28, 2017 1:18 AM
To: Kharbas, Kishor <kishor.kharbas at intel.com>; David Holmes <david.holmes at oracle.com>
Cc: hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8171181: Supporting heap allocation on alternative memory devices

Hi Kishor,

This looks interesting. We are responsible for some of the more exotic ports, e.g. AIX, so we have an interest in making a simple and generic solution for potential future different implementations.

So here some more suggestions (apart from what David already wrote):

-----

I see that the mapping to file is implemented in a way that it always follows the anonymous mapping. I dislike about the implementation that we now carry the file-desc argument down through the memory apis and that these APIs are overloaded with yet another implementation detail. Also, it makes the implementation rigid, only applicable for this one issue (allocating non-large-paged java heap).

I suggest a different approach. Keep your map_file_to_memory() function, but instead calling it deep in the belly of os::reserve_memory() and friends, you could add it as a new "first class" api to os.hpp, something like "os::replace_existing_mapping_with_file_mapping(addr, size, filedesc)". That function would attempt to replace the given existing mapping at [addr,size] with the given file mapping, same as your map_file_to_memory does now.
Difference would be that this function can be called anytime after os::reserve_memory() returns, in case of the jave heap in virtualspace.cpp. You would not have to modify the os::reserve_memory() functions and not add another parameter to them.

The process of replacing the anon mapping with the file mapping would be separated from the process of java heap allocation. This approach would be more flexible:

It could be also used for memory reserved in different ways, e.g. for large paged pinned memory. Maybe it would not work in your case, but this leaves more room for future different implementations. In fact, in this case the coding in virtualspace.cpp would be easier, you would not have to distinguish between special (large paged pinned) reservations and normal reservations, just let ReservedSpace reserve its memory range and at the end of ReservedSpace::initialize(), when all the reservation magic is done and the reserved memory range exists - where ever and in whatever form it was allocated - then attempt to replace this with your mapping by calling your "os::replace_existing_mapping_with_file_mapping()".

It may work or not - you may be able to replace whatever memory ReservedSpace did allocate with whatever special memory you are introducing with the file descriptor. For your case, attempting to replace large paged pinned memory with your NVDIMMS, you would probably get an error and could output a clear message to the user that UseLargePages and AllocateHeapAt are mutually exclusive. Or, the large paged memory would be silently replaced with small paged memory, in which case one has the choice of either continuing or stopping with an error message.

In fact, this API could also be used for different things than the jave heap, e.g. metaspace or code heap. Or maybe even thread stacks, who knows.

-----------

Apart from these general remarks, here some code review (all remarks refer to os_linux.cpp):

+  const char name_template[] = "/jvmheap.XXXXXX";
+  char *fullname = (char*)alloca(strlen(dir) + sizeof(name_template));
+  (void)strcpy(fullname, dir);
+  (void)strcat(fullname, name_template);

I think the array is 1 byte too short, you need to allocate space for the trailing zero. Same error in os_windows.cpp.

-----

+  // block all signals while we do the file operation.
+  (void)sigprocmask(SIG_BLOCK, &set, &oldset);

This is quite dangerous. You switch off signal handling for the whole process. Any non-delay-able error signal will immediately kill the process without trace. What if another thread has an implicit null check?

I also do not understand why this is needed?

-----

+  // set the file creation mask.
+  mode_t new_mask = S_IRUSR | S_IWUSR;
+  mode_t prev_umask = umask(new_mask);
+
+  // create a new file.
+  int fd = mkstemp(fullname);

What David said, this is for the whole process and therefore may clash with other threads.

David suggested fchmod, but I wonder is it even necessary to set the umask in this case?

http://man7.org/linux/man-pages/man3/mkstemp.3.html

       The file is created with permissions 0600, that is, read plus write
       for owner only.  The returned file descriptor provides both read and
       write access to the file.  The file is opened with the open(2) O_EXCL
       flag, guaranteeing that the caller is the process that creates the
       file.

Would the default value not be what you want?

----

At various places you fail if system calls fail. E.g.

+ warning("Failure to create file %s for heap", fullname);

Could you print out the errno for those cases, or strerror() ?

----

Kind regards, Thomas



On Tue, Feb 28, 2017 at 12:52 AM, Kharbas, Kishor <kishor.kharbas at intel.com<mailto:kishor.kharbas at intel.com>> wrote:
Hello all!
I have proposed a JEP to support allocation of Java heap on new memory devices such as NV-DIMMs like Intel's 3D XPoint.
Such a feature has applications in multi-JVM deployments, big data applications and in-memory databases.
Please find the details here - https://bugs.openjdk.java.net/browse/JDK-8171181 and the patch for the implementation is at http://cr.openjdk.java.net/~kkharbas/8153327_webrev.04
I would appreciate a review and feedback from the community for this JEP.
Thank you!
Kishor



More information about the hotspot-runtime-dev mailing list