RFR: 8234930: Use MAP_JIT when allocating pages for code cache on macOS [v3]

Fri Oct 9 06:08:20 UTC 2020

On Wed, 7 Oct 2020 18:42:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Sorry, I had not highlighted that was a proof-of-concept patch to show API changes. I've pushed another PoC with
>> bookkeeping and no API changes at all. But I don't like the new one either. In the new patch, there is a list of
>> (potentially) executable regions that is updated on commit, when the actual desired (non)exec mode become known. If we
>> support mixed exec/non-exec commits in a mapping, then after non-exec commit a part of the mapping cannot be reversed
>> to a potentially executable one (as we've lost MAP_JIT). Then it can produce some unexpected results under _some_
>> conditions in runtime, while API users can be unconscious about potential issues. Good API should not allow that.
>>>  specifying exec ... on per-mapping level it should be enough.
>> 
>> With this, it is possible to simplify the implementation without API changes. But it will still be 1) reserve and be
>> prepare for the first exec or non-exec commit 2) on commit, finish reserve and turn the mapping to the exec or
>> non-exec. All this instead of taking direct parameter "this is a executable mapping" on reserve.  The current "commit
>> only knows about exec" is just a leak of implementation details, as before it was only required to know executable
>> mode. Providing exec parameter to reserve will just bring consistency to the interface. Or, a separate interface for
>> exec (code) mappings will serve the same and will be better, as it will simplify the general non-code reserve/commit
>> interface.
>>> I also see at least three separate cases where we establish a mapping and later need mapping-specific information
>>> somewhere until the next interaction - be it commit/uncommit or release: [ AIX SystemV or mmap, Linux THP, macOS
>>> MAP_JIT for code ]
>> 
>> Could you explain how the choice between SysV and mmap is made on AIX? It looks like
>>   
>> develop(uintx, Use64KPagesThreshold, 0,                                           \
>>           "4K/64K page allocation threshold.")                                      \
>> ...
>>   if (os::vm_page_size() == 4*K) {
>>     return reserve_mmaped_memory(bytes, NULL /* requested_addr */);
>>   } else {
>>     if (bytes >= Use64KPagesThreshold) {
>>       return reserve_shmated_memory(bytes, NULL /* requested_addr */);
>>     } else {
>>       return reserve_mmaped_memory(bytes, NULL /* requested_addr */);
>>     }
>>   }
>> (there only two calls to reserve_shmated_memory and both of them are like above. Is SysV SHM used in product builds?)
>> For now, the AIX case looks a bit different. The choice is made by the platform and the shared code cannot control
>> this. So yes, I cannot see how to avoid handle_t or similar.  In contrast, THP and MAP_JIT are the way to implement a
>> request from the shared code. Even for THP, shared code seems to know why it should "realign" (not sure why commit has
>> an alignment_hint parameter, while it is possible to realign after a regular commit). I assume there is enough context
>> in the shared code that can be provided for platform functions, without a handle_t. And the same context should anyway
>> be provided to reserve function, so handle_t can be filled with all necessary information.
>
>> 
>> Could you explain how the choice between SysV and mmap is made on AIX? It looks like
>> 
>> ```
>> develop(uintx, Use64KPagesThreshold, 0,                                           \
>>           "4K/64K page allocation threshold.")                                      \
>> ...
>>   if (os::vm_page_size() == 4*K) {
>>     return reserve_mmaped_memory(bytes, NULL /* requested_addr */);
>>   } else {
>>     if (bytes >= Use64KPagesThreshold) {
>>       return reserve_shmated_memory(bytes, NULL /* requested_addr */);
>>     } else {
>>       return reserve_mmaped_memory(bytes, NULL /* requested_addr */);
>>     }
>>   }
>> ```
>> 
>> (there only two calls to reserve_shmated_memory and both of them are like above. Is SysV SHM used in product builds?)
>> For now, the AIX case looks a bit different. The choice is made by the platform and the shared code cannot control
>> this. So yes, I cannot see how to avoid handle_t or similar.
> 
> On AIX we have 4K and 64K pages (actually more but those are interesting). 64K pages are desireable for larger areas
> like heap. 64K pages can only be allocated with SystemV shared memory. mmap'ed memory is always 4K paged. But SystemV
> shared mem has a number of disadvantages, like inability to protect the memory, and a large attach alignment (256M). So
> it is cumbersome.  os::vm_page_size() on AIX is a fake. The hotspot code assumes that the underlying Operating System
> has some sort of "base page size" (usually what is returned by sysconf(_SC_PAGESIZE)), and then optionally some sort of
> huge page size which follows different rules (e.g. pinned). On Aix things are more fluid.   When investigating 64K page
> support on AIX I decided eventually to fool hotspot into thinking that the base page size is 64k. Long story, this was
> way before the OpenJDK existed and this was a propietary code base with no possibilty of changing things upstream.
> Therefore os::vm_page_size returns 64K ("64K fake mode"). This can be disabled.  So above code fragment uses mmaped
> memory if 64K fake mode is disabled, and if it is enabeld, it uses mmap for smaller regions and shmget for larger ones.
>> 
>> In contrast, THP and MAP_JIT are the way to implement a request from the shared code. Even for THP, shared code seems
>> to know why it should "realign" (not sure why commit has an alignment_hint parameter, while it is possible to realign
>> after a regular commit). I assume there is enough context in the shared code that can be provided for platform
>> functions, without a handle_t. And the same context should anyway be provided to reserve function, so handle_t can be
>> filled with all necessary information.
> 
> I believe the alignment hint and the TPH code had their roots in Solaris code. So its current form (I guess) is heavily
> warped by history. A new implementation would maybe just have a "os::set_tph(start, size)" function and leave it at
> that. And yes, I do not think it is necessary for os::commit to do this.  In fact, Linux could probably set TPH
> unconditionally always when UseTransparentHugePages is active. That would alleviate the need for the alignment_hint
> parameter and the realign function.  I opened https://bugs.openjdk.java.net/browse/JDK-8253890 to follow up on this.

(more comments)

> Sorry, I had not highlighted that was a proof-of-concept patch to show API changes. I've pushed another PoC with
> bookkeeping and no API changes at all. But I don't like the new one either.

Interesting idea, but IMHO too heavvy weight for a platform only change. Also GrowableArray maybe not the best choice
here since e.g. it requires you to search twice on add. A better solution may be a specialized BST. If there are other
uses for such a solution (managing memory regions, melting them together, splitting them maybe on remove) this would be
worth a generic class. I believe NMT does something similar when managing virtual memory regions, see
VirtualMemoryTracker and friends.

> In the new patch, there is a list of (potentially) executable regions that is updated on commit, when the actual
> desired (non)exec mode become known. If we support mixed exec/non-exec commits in a mapping, then after non-exec commit
> a part of the mapping cannot be reversed to a potentially executable one (as we've lost MAP_JIT).

So once you cleared MAP_JIT from a region you cannot re-apply it? Then this is another reason we should not support
setting and clearing exec on commit but only on a per-mapping base.

> Then it can produce some unexpected results under _some_ conditions in runtime, while API users can be unconscious
> about potential issues. Good API should not allow that.

-------------

PR: https://git.openjdk.java.net/jdk/pull/294