RFR: 8341334: CDS: Parallel relocation [v6]

Wed Nov 6 09:47:31 UTC 2024

On Tue, 5 Nov 2024 16:13:16 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> src/hotspot/share/cds/filemap.cpp line 1758:
>> 
>>> 1756:     char* start = _from + MIN2(_bytes, _bytes * chunk / max_chunks);
>>> 1757:     char* end   = _from + MIN2(_bytes, _bytes * (chunk + 1) / max_chunks);
>>> 1758:     os::pretouch_memory(start, end);
>> 
>> What happens if I have many cores and a small memory range? We would have many workers for a potentially smallish total range. Could start-end then end up being tiny? 
>> 
>> On Linux, we would do madvise MADV_POPULATE_WRITE. Could we end up feeding invalid range lengths to madvise, not page aligned? Or, could it just be inefficient if many threads try to madvise the same overlapping areas (see len calculation in os::pd_pretouch_memory)
>
> Small memory range and lots of workers -- that's an interesting question. I think we want to cap the chunk size from below, at least by `os::page_size()`. Let me see if I can do this without messing things up.
> 
> `os::pretouch_memory` already does `MADV_POPULATE_WRITE` when supported (see JDK-8315923). The key thing for fast startup-time-sensitive pretouch is to eat the memory faults in multiple threads. It is arguably a kernel "issue" that `MADV_POPULATE_WRITE` is single-threaded, given that kernel can _probably_ do this with kernel workers liek we do it here on JVM side, but that is not something I expect to be available any time soon.

Not relevant anymore, as I yanked pretouch code from this PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21302#discussion_r1830688280