RFR: 8234930: Use MAP_JIT when allocating pages for code cache on macOS [v3]

Fri Oct 9 06:19:22 UTC 2020

On Fri, 9 Oct 2020 06:05:56 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>>> 
>>> Could you explain how the choice between SysV and mmap is made on AIX? It looks like
>>> 
>>> ```
>>> develop(uintx, Use64KPagesThreshold, 0,                                           \
>>>           "4K/64K page allocation threshold.")                                      \
>>> ...
>>>   if (os::vm_page_size() == 4*K) {
>>>     return reserve_mmaped_memory(bytes, NULL /* requested_addr */);
>>>   } else {
>>>     if (bytes >= Use64KPagesThreshold) {
>>>       return reserve_shmated_memory(bytes, NULL /* requested_addr */);
>>>     } else {
>>>       return reserve_mmaped_memory(bytes, NULL /* requested_addr */);
>>>     }
>>>   }
>>> ```
>>> 
>>> (there only two calls to reserve_shmated_memory and both of them are like above. Is SysV SHM used in product builds?)
>>> For now, the AIX case looks a bit different. The choice is made by the platform and the shared code cannot control
>>> this. So yes, I cannot see how to avoid handle_t or similar.
>> 
>> On AIX we have 4K and 64K pages (actually more but those are interesting). 64K pages are desireable for larger areas
>> like heap. 64K pages can only be allocated with SystemV shared memory. mmap'ed memory is always 4K paged. But SystemV
>> shared mem has a number of disadvantages, like inability to protect the memory, and a large attach alignment (256M). So
>> it is cumbersome.  os::vm_page_size() on AIX is a fake. The hotspot code assumes that the underlying Operating System
>> has some sort of "base page size" (usually what is returned by sysconf(_SC_PAGESIZE)), and then optionally some sort of
>> huge page size which follows different rules (e.g. pinned). On Aix things are more fluid.   When investigating 64K page
>> support on AIX I decided eventually to fool hotspot into thinking that the base page size is 64k. Long story, this was
>> way before the OpenJDK existed and this was a propietary code base with no possibilty of changing things upstream.
>> Therefore os::vm_page_size returns 64K ("64K fake mode"). This can be disabled.  So above code fragment uses mmaped
>> memory if 64K fake mode is disabled, and if it is enabeld, it uses mmap for smaller regions and shmget for larger ones.
>>> 
>>> In contrast, THP and MAP_JIT are the way to implement a request from the shared code. Even for THP, shared code seems
>>> to know why it should "realign" (not sure why commit has an alignment_hint parameter, while it is possible to realign
>>> after a regular commit). I assume there is enough context in the shared code that can be provided for platform
>>> functions, without a handle_t. And the same context should anyway be provided to reserve function, so handle_t can be
>>> filled with all necessary information.
>> 
>> I believe the alignment hint and the TPH code had their roots in Solaris code. So its current form (I guess) is heavily
>> warped by history. A new implementation would maybe just have a "os::set_tph(start, size)" function and leave it at
>> that. And yes, I do not think it is necessary for os::commit to do this.  In fact, Linux could probably set TPH
>> unconditionally always when UseTransparentHugePages is active. That would alleviate the need for the alignment_hint
>> parameter and the realign function.  I opened https://bugs.openjdk.java.net/browse/JDK-8253890 to follow up on this.
>
> (more comments)
> 
>> Sorry, I had not highlighted that was a proof-of-concept patch to show API changes. I've pushed another PoC with
>> bookkeeping and no API changes at all. But I don't like the new one either.
> 
> Interesting idea, but IMHO too heavvy weight for a platform only change. Also GrowableArray maybe not the best choice
> here since e.g. it requires you to search twice on add. A better solution may be a specialized BST. If there are other
> uses for such a solution (managing memory regions, melting them together, splitting them maybe on remove) this would be
> worth a generic class. I believe NMT does something similar when managing virtual memory regions, see
> VirtualMemoryTracker and friends.
>> In the new patch, there is a list of (potentially) executable regions that is updated on commit, when the actual
>> desired (non)exec mode become known. If we support mixed exec/non-exec commits in a mapping, then after non-exec commit
>> a part of the mapping cannot be reversed to a potentially executable one (as we've lost MAP_JIT).
> 
> So once you cleared MAP_JIT from a region you cannot re-apply it? Then this is another reason we should not support
> setting and clearing exec on commit but only on a per-mapping base.
>> Then it can produce some unexpected results under _some_ conditions in runtime, while API users can be unconscious
>> about potential issues. Good API should not allow that.

Interestingly, if you look at https://github.com/openjdk/jdk/pull/49 (The new Alpine Linux port) it introduces a
function called check_pax().

Pax seems to be a switchable restriction of the Linux kernel which disallows setting a memory section executable after
it had been established. Another reason for making "exec" a property of the mapping itself and establish it at creation
time, and then never touch it again.

-------------

PR: https://git.openjdk.java.net/jdk/pull/294