Re: [ZGC] [aarch64] Unable to allocate heap for certain Linux kernel configurations
On 2020-08-28 11:38, Christoph Göttschkes wrote:
Hi Stefan, thanks for your feedback. Looks like this case isn't as exotic as I first thought. May I ask which kind of machine this is? Also a small embedded board?
It was one of our compiler devs that ran into this. I don't think it was a small machine, but rather that was configured differently then other AArch64 machines that we've run on.
On 2020-08-28 11:04, Stefan Karlsson wrote:
I think we hit a very similar problem during some internal testing on one machine. I have a patch to workaround that problem:
https://cr.openjdk.java.net/~stefank/prototype/zaarch-va/webrev.01/
Your patch works with some modifications. In my case, only 39 bits are available in the virtual address space. I put that value as "va_bits" and it works.
OK. Good to know.
Unfortunately, this patch only solves the problem on a very specific setup, and I don't think it covers your use-case. Hopefully, someone with enough AArch64 machine config knowledge would be able to extend this patch to also cover all possible combinations.
I don't think the possible combinations are the problem. [1] shows them (sorry, didn't put that link in the last mail) for Linux. I think the real problem is detecting this and making the addressing scheme adjust itself.
Maybe there could be a mechanism which tries to allocate memory beyond certain addresses to try and detect the number of bits available? On my machine, for instance, the ZGC implementation tries to allocate memory with different starting addresses, but always gets an address back which is way smaller (because of the kernel limitations). Maybe, the ZGC implementation could store this information (the number of bits in addresses returned by mmap) and use this information to try and make another loop, which tries to allocate the heap with a reduced number of bits used for the addresses. This could also be a HotSpot option, to speed things up during startup if one knows that the machine uses a "weird" configuration.
I think creating a bug report would be a good start. Do you have an openjdk user name? If not I can create a bug report.
Yes, I can create bug reports. I used my first mail and created one [2].
Thanks! We have had some brief discussions with Stuart (CC:ed), who created the AArch64 port, about this problem. Maybe he has had some time to think about it, and have some additional insights or ideas. Thanks, StefanK
-- Christoph
[1] https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/arm64/m... [2] https://bugs.openjdk.java.net/browse/JDK-8252500
(Updated Stuart's mail address) On 2020-08-28 12:42, Stefan Karlsson wrote:
On 2020-08-28 11:38, Christoph Göttschkes wrote:
Hi Stefan, thanks for your feedback. Looks like this case isn't as exotic as I first thought. May I ask which kind of machine this is? Also a small embedded board?
It was one of our compiler devs that ran into this. I don't think it was a small machine, but rather that was configured differently then other AArch64 machines that we've run on.
On 2020-08-28 11:04, Stefan Karlsson wrote:
I think we hit a very similar problem during some internal testing on one machine. I have a patch to workaround that problem:
https://cr.openjdk.java.net/~stefank/prototype/zaarch-va/webrev.01/
Your patch works with some modifications. In my case, only 39 bits are available in the virtual address space. I put that value as "va_bits" and it works.
OK. Good to know.
Unfortunately, this patch only solves the problem on a very specific setup, and I don't think it covers your use-case. Hopefully, someone with enough AArch64 machine config knowledge would be able to extend this patch to also cover all possible combinations.
I don't think the possible combinations are the problem. [1] shows them (sorry, didn't put that link in the last mail) for Linux. I think the real problem is detecting this and making the addressing scheme adjust itself.
Maybe there could be a mechanism which tries to allocate memory beyond certain addresses to try and detect the number of bits available? On my machine, for instance, the ZGC implementation tries to allocate memory with different starting addresses, but always gets an address back which is way smaller (because of the kernel limitations). Maybe, the ZGC implementation could store this information (the number of bits in addresses returned by mmap) and use this information to try and make another loop, which tries to allocate the heap with a reduced number of bits used for the addresses. This could also be a HotSpot option, to speed things up during startup if one knows that the machine uses a "weird" configuration.
I think creating a bug report would be a good start. Do you have an openjdk user name? If not I can create a bug report.
Yes, I can create bug reports. I used my first mail and created one [2].
Thanks! We have had some brief discussions with Stuart (CC:ed), who created the AArch64 port, about this problem. Maybe he has had some time to think about it, and have some additional insights or ideas.
Thanks, StefanK
-- Christoph
[1] https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/arm64/m... [2] https://bugs.openjdk.java.net/browse/JDK-8252500
On 28/08/2020 11:42, Stefan Karlsson wrote:
On 2020-08-28 11:38, Christoph Göttschkes wrote:
Hi Stefan, thanks for your feedback. Looks like this case isn't as exotic as I first thought. May I ask which kind of machine this is? Also a small embedded board?
It was one of our compiler devs that ran into this. I don't think it was a small machine, but rather that was configured differently then other AArch64 machines that we've run on.
On 2020-08-28 11:04, Stefan Karlsson wrote:
I think we hit a very similar problem during some internal testing on one machine. I have a patch to workaround that problem:
https://cr.openjdk.java.net/~stefank/prototype/zaarch-va/webrev.01/
Your patch works with some modifications. In my case, only 39 bits are available in the virtual address space. I put that value as "va_bits" and it works.
OK. Good to know.
Unfortunately, this patch only solves the problem on a very specific setup, and I don't think it covers your use-case. Hopefully, someone with enough AArch64 machine config knowledge would be able to extend this patch to also cover all possible combinations.
I don't think the possible combinations are the problem. [1] shows them (sorry, didn't put that link in the last mail) for Linux. I think the real problem is detecting this and making the addressing scheme adjust itself.
Maybe there could be a mechanism which tries to allocate memory beyond certain addresses to try and detect the number of bits available? On my machine, for instance, the ZGC implementation tries to allocate memory with different starting addresses, but always gets an address back which is way smaller (because of the kernel limitations). Maybe, the ZGC implementation could store this information (the number of bits in addresses returned by mmap) and use this information to try and make another loop, which tries to allocate the heap with a reduced number of bits used for the addresses. This could also be a HotSpot option, to speed things up during startup if one knows that the machine uses a "weird" configuration.
I think creating a bug report would be a good start. Do you have an openjdk user name? If not I can create a bug report.
Yes, I can create bug reports. I used my first mail and created one [2].
Thanks! We have had some brief discussions with Stuart (CC:ed), who created the AArch64 port, about this problem. Maybe he has had some time to think about it, and have some additional insights or ideas.
Thanks, StefanK
-- Christoph
[1] https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/arm64/m... [2] https://bugs.openjdk.java.net/browse/JDK-8252500
Hi, That's right - I have been exploring options on this, and I had a similar solution at one point to finding the address space size. From speaking with people familiar with the arm64 linux kernel, there is no good way to query the available address space except for probining it and testing what is there. Thinking we could do with a general-purpose routine, I experimented with a routine that forks the process and probes the address space non-destructively. MAP_FIXED implicitly destroys any existing mappings. Of course, ZGC mmaps memory at fixed addresses anyhow, so the concern about embedded the JVM in your program and destroying existing mappings turned out to be moot, as we'd be doing that anyway. CCing Monica as the Windows platform might have similar issues. Thanks, Stuart BR, Stuart
On 2020-08-28 16:56, Stuart Monteith wrote:
On 28/08/2020 11:42, Stefan Karlsson wrote:
On 2020-08-28 11:38, Christoph Göttschkes wrote:
Hi Stefan, thanks for your feedback. Looks like this case isn't as exotic as I first thought. May I ask which kind of machine this is? Also a small embedded board?
It was one of our compiler devs that ran into this. I don't think it was a small machine, but rather that was configured differently then other AArch64 machines that we've run on.
On 2020-08-28 11:04, Stefan Karlsson wrote:
I think we hit a very similar problem during some internal testing on one machine. I have a patch to workaround that problem:
https://cr.openjdk.java.net/~stefank/prototype/zaarch-va/webrev.01/
Your patch works with some modifications. In my case, only 39 bits are available in the virtual address space. I put that value as "va_bits" and it works.
OK. Good to know.
Unfortunately, this patch only solves the problem on a very specific setup, and I don't think it covers your use-case. Hopefully, someone with enough AArch64 machine config knowledge would be able to extend this patch to also cover all possible combinations.
I don't think the possible combinations are the problem. [1] shows them (sorry, didn't put that link in the last mail) for Linux. I think the real problem is detecting this and making the addressing scheme adjust itself.
Maybe there could be a mechanism which tries to allocate memory beyond certain addresses to try and detect the number of bits available? On my machine, for instance, the ZGC implementation tries to allocate memory with different starting addresses, but always gets an address back which is way smaller (because of the kernel limitations). Maybe, the ZGC implementation could store this information (the number of bits in addresses returned by mmap) and use this information to try and make another loop, which tries to allocate the heap with a reduced number of bits used for the addresses. This could also be a HotSpot option, to speed things up during startup if one knows that the machine uses a "weird" configuration.
I think creating a bug report would be a good start. Do you have an openjdk user name? If not I can create a bug report.
Yes, I can create bug reports. I used my first mail and created one [2].
Thanks! We have had some brief discussions with Stuart (CC:ed), who created the AArch64 port, about this problem. Maybe he has had some time to think about it, and have some additional insights or ideas.
Thanks, StefanK
-- Christoph
[1] https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/arm64/m... [2] https://bugs.openjdk.java.net/browse/JDK-8252500
Hi, That's right - I have been exploring options on this, and I had a similar solution at one point to finding the address space size. From speaking with people familiar with the arm64 linux kernel, there is no good way to query the available address space except for probining it and testing what is there. Thinking we could do with a general-purpose routine, I experimented with a routine that forks the process and probes the address space non-destructively. MAP_FIXED implicitly destroys any existing mappings. Of course, ZGC mmaps memory at fixed addresses anyhow, so the concern about embedded the JVM in your program and destroying existing mappings turned out to be moot, as we'd be doing that anyway.
Maybe I misunderstand this point, but we use fixed addresses when we probe the address space, but we don't use MAP_FIXED: static bool map(uintptr_t start, size_t size) { const void* const res = mmap((void*)start, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, -1, 0) bool ZVirtualMemoryManager::reserve_contiguous_platform(uintptr_t start, size_t size) { // Reserve address views const uintptr_t marked0 = ZAddress::marked0(start); const uintptr_t marked1 = ZAddress::marked1(start); const uintptr_t remapped = ZAddress::remapped(start); if (!map(marked0, size)) { StefanK
CCing Monica as the Windows platform might have similar issues.
Thanks, Stuart
BR, Stuart
* Stefan Karlsson:
Maybe I misunderstand this point, but we use fixed addresses when we probe the address space, but we don't use MAP_FIXED:
static bool map(uintptr_t start, size_t size) { const void* const res = mmap((void*)start, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, -1, 0)
bool ZVirtualMemoryManager::reserve_contiguous_platform(uintptr_t start, size_t size) { // Reserve address views const uintptr_t marked0 = ZAddress::marked0(start); const uintptr_t marked1 = ZAddress::marked1(start); const uintptr_t remapped = ZAddress::remapped(start);
if (!map(marked0, size)) {
Note that you can speed this up a little bit on recent-ish kernels if you use MAP_FIXED_NOREPLACE. Thanks, Florian
Hi Florian, On 2020-08-28 22:21, Florian Weimer wrote:
* Stefan Karlsson:
Maybe I misunderstand this point, but we use fixed addresses when we probe the address space, but we don't use MAP_FIXED:
static bool map(uintptr_t start, size_t size) { const void* const res = mmap((void*)start, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, -1, 0)
bool ZVirtualMemoryManager::reserve_contiguous_platform(uintptr_t start, size_t size) { // Reserve address views const uintptr_t marked0 = ZAddress::marked0(start); const uintptr_t marked1 = ZAddress::marked1(start); const uintptr_t remapped = ZAddress::remapped(start);
if (!map(marked0, size)) {
Note that you can speed this up a little bit on recent-ish kernels if you use MAP_FIXED_NOREPLACE.
I didn't realize that this would make the probing faster. I tested this by blocking out the address ranges we probe with: reserve_contiguous_platform(ZAddress::marked0(0), ZAddressOffsetMax); reserve_contiguous_platform(ZAddress::marked1(0), ZAddressOffsetMax); reserve_contiguous_platform(ZAddress::remapped(0), ZAddressOffsetMax); and ran with and without MAP_FIXED_NOREPLACE. I could see a decrease in in the worst-case times. Is this something you want to provide a patch for? If not, I can create a JBS entry and send out a patch for review. Thanks, StefanK
Thanks, Florian
* Stefan Karlsson:
Note that you can speed this up a little bit on recent-ish kernels if you use MAP_FIXED_NOREPLACE.
I didn't realize that this would make the probing faster. I tested this by blocking out the address ranges we probe with:
reserve_contiguous_platform(ZAddress::marked0(0), ZAddressOffsetMax); reserve_contiguous_platform(ZAddress::marked1(0), ZAddressOffsetMax); reserve_contiguous_platform(ZAddress::remapped(0), ZAddressOffsetMax);
and ran with and without MAP_FIXED_NOREPLACE. I could see a decrease in in the worst-case times. Is this something you want to provide a patch for? If not, I can create a JBS entry and send out a patch for review.
No, sorry, it was just a random comment from the sidelines. Florian
Hi Stuart. On 2020-08-28 16:56, Stuart Monteith wrote:
Hi, That's right - I have been exploring options on this, and I had a similar solution at one point to finding the address space size. From speaking with people familiar with the arm64 linux kernel, there is no good way to query the available address space except for probining it and testing what is there. Thinking we could do with a general-purpose routine, I experimented with a routine that forks the process and probes the address space non-destructively. MAP_FIXED implicitly destroys any existing mappings. Of course, ZGC mmaps memory at fixed addresses anyhow, so the concern about embedded the JVM in your program and destroying existing mappings turned out to be moot, as we'd be doing that anyway. After reading your comment, the only other viable solution I came up with is using a combination of msync and mmap. I didn't fully look into this yet, but made a small prototype to show you the idea and to get early feedback. Maybe someone already looked into this and knows that some edge case doesn't work.
General idea: First use msync to check if there is already a mapping for the page. If msync returns ENOMEM, this either means: there is no mapping yet, or the address is invalid. After getting ENOMEM, we can use mmap to try and map the page. If we get back the same address, we know the address is valid. If the address is different, we know it is invalid. I also don't want to use MAP_FIXED, but maybe MAP_FIXED_NOREPLACE (as mentioned by Florian) would be a solution, but it was only introduced in Linux 4.17, so I guess we would need a fallback solution. I tested this on different Linux aarch64 devices and it seems to work. You can find the prototype here: https://cr.openjdk.java.net/~cgo/8252500/prototype-webrev.01/ -- Christoph
On 2020-08-31 10:29, Christoph Göttschkes wrote:
Hi Stuart.
On 2020-08-28 16:56, Stuart Monteith wrote:
Hi, That's right - I have been exploring options on this, and I had a similar solution at one point to finding the address space size. From speaking with people familiar with the arm64 linux kernel, there is no good way to query the available address space except for probining it and testing what is there. Thinking we could do with a general-purpose routine, I experimented with a routine that forks the process and probes the address space non-destructively. MAP_FIXED implicitly destroys any existing mappings. Of course, ZGC mmaps memory at fixed addresses anyhow, so the concern about embedded the JVM in your program and destroying existing mappings turned out to be moot, as we'd be doing that anyway. After reading your comment, the only other viable solution I came up with is using a combination of msync and mmap. I didn't fully look into this yet, but made a small prototype to show you the idea and to get early feedback. Maybe someone already looked into this and knows that some edge case doesn't work.
General idea: First use msync to check if there is already a mapping for the page. If msync returns ENOMEM, this either means: there is no mapping yet, or the address is invalid. After getting ENOMEM, we can use mmap to try and map the page. If we get back the same address, we know the address is valid. If the address is different, we know it is invalid. I also don't want to use MAP_FIXED, but maybe MAP_FIXED_NOREPLACE (as mentioned by Florian) would be a solution, but it was only introduced in Linux 4.17, so I guess we would need a fallback solution.
I tested this on different Linux aarch64 devices and it seems to work. You can find the prototype here:
https://cr.openjdk.java.net/~cgo/8252500/prototype-webrev.01/
General approach seems fine to me. Maybe someone more well-versed in msync can chime in? Some comments about the patch: - Maybe check for EINVAL and/or add an assert to catch the odd error codes in debug builds? - If no max bits are found, the subsequent code will underflow. I think returning some kind of lower-limit that works with the code in ZPlatformAddressOffsetBits would be prudent. A few things that will make the code look more like HotSpot code: - We usually use os::vm_pages_size() instead of sysconf(_SC_PAGE_SIZE). - 'sizeof(uintptr_t) * CHAR_BIT' maybe replace with BitsPerWord - %zu use SIZE_FORMAT instead Thanks, StefanK
-- Christoph
Thanks for your early feedback. I also would like to hear if someone knows some problems with msync being used like that. I made some more tests and until now, everything looks fine, no weird behavior found. Regardless of the kind of mapping, msync always succeeds and only returns ENOMEM if the page is not mapped, or the address is invalid. Yes, the patch was self contained, since I basically copied it from a self contained test program, only to share it with you and to quickly test it within the JVM, to see if the JVM already runs multi threaded while the ZGC initialization is done. When I am done with my testing, I will create an RFR for the created bug with this technique to get some more feedback on the topic. Thanks, Christoph On 2020-08-31 12:47, Stefan Karlsson wrote:
On 2020-08-31 10:29, Christoph Göttschkes wrote:
Hi Stuart.
On 2020-08-28 16:56, Stuart Monteith wrote:
Hi, That's right - I have been exploring options on this, and I had a similar solution at one point to finding the address space size. From speaking with people familiar with the arm64 linux kernel, there is no good way to query the available address space except for probining it and testing what is there. Thinking we could do with a general-purpose routine, I experimented with a routine that forks the process and probes the address space non-destructively. MAP_FIXED implicitly destroys any existing mappings. Of course, ZGC mmaps memory at fixed addresses anyhow, so the concern about embedded the JVM in your program and destroying existing mappings turned out to be moot, as we'd be doing that anyway. After reading your comment, the only other viable solution I came up with is using a combination of msync and mmap. I didn't fully look into this yet, but made a small prototype to show you the idea and to get early feedback. Maybe someone already looked into this and knows that some edge case doesn't work.
General idea: First use msync to check if there is already a mapping for the page. If msync returns ENOMEM, this either means: there is no mapping yet, or the address is invalid. After getting ENOMEM, we can use mmap to try and map the page. If we get back the same address, we know the address is valid. If the address is different, we know it is invalid. I also don't want to use MAP_FIXED, but maybe MAP_FIXED_NOREPLACE (as mentioned by Florian) would be a solution, but it was only introduced in Linux 4.17, so I guess we would need a fallback solution.
I tested this on different Linux aarch64 devices and it seems to work. You can find the prototype here:
https://cr.openjdk.java.net/~cgo/8252500/prototype-webrev.01/
General approach seems fine to me. Maybe someone more well-versed in msync can chime in?
Some comments about the patch:
- Maybe check for EINVAL and/or add an assert to catch the odd error codes in debug builds?
- If no max bits are found, the subsequent code will underflow. I think returning some kind of lower-limit that works with the code in ZPlatformAddressOffsetBits would be prudent.
A few things that will make the code look more like HotSpot code:
- We usually use os::vm_pages_size() instead of sysconf(_SC_PAGE_SIZE).
- 'sizeof(uintptr_t) * CHAR_BIT' maybe replace with BitsPerWord
- %zu use SIZE_FORMAT instead
Thanks, StefanK
-- Christoph
On 02/09/2020 08:12, Christoph Göttschkes wrote:
Thanks for your early feedback. I also would like to hear if someone knows some problems with msync being used like that. I made some more tests and until now, everything looks fine, no weird behavior found. Regardless of the kind of mapping, msync always succeeds and only returns ENOMEM if the page is not mapped, or the address is invalid.
Yes, the patch was self contained, since I basically copied it from a self contained test program, only to share it with you and to quickly test it within the JVM, to see if the JVM already runs multi threaded while the ZGC initialization is done. When I am done with my testing, I will create an RFR for the created bug with this technique to get some more feedback on the topic.
Thanks, Christoph
On 2020-08-31 12:47, Stefan Karlsson wrote:
On 2020-08-31 10:29, Christoph Göttschkes wrote:
Hi Stuart.
On 2020-08-28 16:56, Stuart Monteith wrote:
Hi, That's right - I have been exploring options on this, and I had a similar solution at one point to finding the address space size. From speaking with people familiar with the arm64 linux kernel, there is no good way to query the available address space except for probining it and testing what is there. Thinking we could do with a general-purpose routine, I experimented with a routine that forks the process and probes the address space non-destructively. MAP_FIXED implicitly destroys any existing mappings. Of course, ZGC mmaps memory at fixed addresses anyhow, so the concern about embedded the JVM in your program and destroying existing mappings turned out to be moot, as we'd be doing that anyway. After reading your comment, the only other viable solution I came up with is using a combination of msync and mmap. I didn't fully look into this yet, but made a small prototype to show you the idea and to get early feedback. Maybe someone already looked into this and knows that some edge case doesn't work.
General idea: First use msync to check if there is already a mapping for the page. If msync returns ENOMEM, this either means: there is no mapping yet, or the address is invalid. After getting ENOMEM, we can use mmap to try and map the page. If we get back the same address, we know the address is valid. If the address is different, we know it is invalid. I also don't want to use MAP_FIXED, but maybe MAP_FIXED_NOREPLACE (as mentioned by Florian) would be a solution, but it was only introduced in Linux 4.17, so I guess we would need a fallback solution.
I tested this on different Linux aarch64 devices and it seems to work. You can find the prototype here:
https://cr.openjdk.java.net/~cgo/8252500/prototype-webrev.01/
General approach seems fine to me. Maybe someone more well-versed in msync can chime in?
Some comments about the patch:
- Maybe check for EINVAL and/or add an assert to catch the odd error codes in debug builds?
- If no max bits are found, the subsequent code will underflow. I think returning some kind of lower-limit that works with the code in ZPlatformAddressOffsetBits would be prudent.
A few things that will make the code look more like HotSpot code:
- We usually use os::vm_pages_size() instead of sysconf(_SC_PAGE_SIZE).
- 'sizeof(uintptr_t) * CHAR_BIT' maybe replace with BitsPerWord
- %zu use SIZE_FORMAT instead
Thanks, StefanK
-- Christoph
Hi, You approach + Stefan's comments looks reasonable to me . I tested your solution standalone on a Raspbery Pi 4 with 39 bit VA and on a workstation with 48-bit addressing. The configurations we support in Linux on arch64 are: 4KB pages: 39, 48 bits. 16KB pages: 36, 47, 48 bits 64KB pages: 42, 48, 52 bits. Only the 4KB and 64KB configurations are worth considering, I don't believe there are 16KB page configurations out there, so you could cut the range down to a 52 to 39 bits search. For completeness I should mention that this potentially could be cross platform code. 5-level paging was added to x86, so they will have 57 bits available. ppc/s390x don't have ZGC ports yet. My advice would be to stick with 48 bits for now, as there may be extra considerations with the 52/57 bit address spaces. BR, Stuart
On 2020-09-02 17:59, Stuart Monteith wrote:
Hi, You approach + Stefan's comments looks reasonable to me . I tested your solution standalone on a Raspbery Pi 4 with 39 bit VA and on a workstation with 48-bit addressing.
The configurations we support in Linux on arch64 are: 4KB pages: 39, 48 bits. 16KB pages: 36, 47, 48 bits 64KB pages: 42, 48, 52 bits.
Only the 4KB and 64KB configurations are worth considering, I don't believe there are 16KB page configurations out there, so you could cut the range down to a 52 to 39 bits search.
For completeness I should mention that this potentially could be cross platform code. 5-level paging was added to x86, so they will have 57 bits available. ppc/s390x don't have ZGC ports yet.
My advice would be to stick with 48 bits for now, as there may be extra considerations with the 52/57 bit address spaces.
Thanks for looking into this. I am about to post my RFR for this bug. I already included some of your suggestions (with some slight differences), but let's continue the discussion in the RFR thread. -- Christoph
participants (4)
-
Christoph Göttschkes
-
Florian Weimer
-
Stefan Karlsson
-
Stuart Monteith