RFR: 8259643: ZGC can return metaspace OOM prematurely [v2]

Fri Jan 29 16:56:44 UTC 2021

On Fri, 29 Jan 2021 11:23:30 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Hi Thomas,
>> 
>> Thanks for chiming in! I will reply inline.
>> 
>>> Hi Erik,
>>> 
>>> lets see if I understand the problem:
>>> 
>>> 1 n threads allocate metaspace
>>> 2 thread A gets an allocation error (not HWM but a hard limit)
>>> 3 .. returns, (eventually) schedules a synchronous GC.
>>> 4 .. gc runs, the CLDG is at some point purged, releases metaspace pressure
>>> 5 other threads load classes, allocating metaspace concurrently, benefitting from the released pressure
>>> 6 thread A reattempts allocation, fails again.
>> 
>> That's the one.
>> 
>>> This is normally not a problem, no? 
>> 
>> Indeed, and that was my gut feeling when the current handling was written. I wouldn't expect an actual application to ever hit this problem. Nevertheless, I think it's a soundness problem with completely unbounded starvation, even though it doesn't happen in real life. So I think I would still like to fix the issue.
>> 
>> It is definitely a bit arbitrary though where we decide to draw the line of what we guarantee, and what the balance between exactness and maintainability should be. My aim is to try hard enough so we don't rely on luck (random sleeps) if a program that shouldn't be even close to OOM will fail or not, even though you have to be *very* unlucky for it to fail. But I am not interested in prefect guarantees either, down to the last allocation. My balance is that I allow throwing OOM prematurely if we are "really close" to being down to the last allocation OOM, but if you are not "really close", then no sleep in the world should cause a failure.
>> 
>>> Which thread exactly gets the OOM if the VM hovers that close to the limit does not really matter. But you write "This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM." - so we could get OOMs even if most of the Metaspace were vacant? This only can happen if, between (4) and (6), other threads not only allocate metaspace, but also then loose the loaders used for those allocations, to late for the GC at (4) to collect them but before (5). Collecting them would require another GC.
>> 
>> Right, and that is indeed what the test does. It loads chunks of 1000 classes and releases them, assuming that surely after releasing them, I can allocate more classes. Unless of course starvation ruins the day.
>> 
>>> In other words, the contract is that we only throw an OOM if we really tried everything, but since the effects of the first GC are "stale" it does not count as try?
>> 
>> The previous contract was that we try to allocate again after a full GC, and if that fails we give up. The issue is that this guarantee allows the GC to free up 99% of metaspace, yet still fail the allocation due to races with other threads doing the same thing. So at any given point, it might be that only 1% of metadata is reachable, yet an OOM can be produced if you are "unlucky".
>> 
>>> Do you think this is a realistic problem?
>> 
>> It is realistic enough that one stress test has failed in the real world. Yet I don't think any application out there will run into any issue. But I prefer having a sound solution where we can know that and not rely on probability.
>> 
>>> Do I understand your patch right in that you divide allocations in two priority classes, add another lock, MetaspaceCritical_lock, which blocks normal allocations as long as critical allocations are queued?
>> 
>> Yes, that's exactly right.
>> 
>>> Sorry if I am slow :)
>> 
>> Not at all!
>> 
>>> One problem I see is that Metaspace::purge is not the full purge. Reclaiming metaspace happens in two stages:
>>> 
>>>     1. in CLDG::purge, we delete all `ClassLoaderMetaspace` objects belonging to dead loaders. This releases all their metaspace to the freelists, optionally uncommitting portions of it (since JEP387).
>>> 
>>>     2. in Metaspace::purge, we go through Metaspace and munmap any mappings which are now completely vacant.
>>> 
>>> 
>>> The metaspace pressure release already happens in (1), so any concurrent thread allocating will benefit already.
>> 
>> Ah. I thought it was all done in 2. I can move the Big Fat Lock to cover all of CLDG::purge instead. What do you think? It just needs to cover the entire thing basically.
>> 
>>> Why do we even need a queue? Why could we not just let the first thread attempting a synchronous gc block metaspace allocation path for all threads, including others running into a limit, until the gc is finished and it had its first-allocation-right served?
>> 
>> Each "critical" allocation rides on one particular GC cycle, that denotes the make-or-break point of the allocation.
>> In order to prevent starvation, we have to satisfy all critical allocations who have their make-or-break GC cycle associated with the current purge() operation, before we release the lock in purge(), letting new allocations in, or we will rely on luck again. However, among the pending critical allocations, they will all possibly have different make-or-break GC cycles associated with them. So in purge() some of them need to be satisfied, and others do not, yet can happily get their allocations satisfied opportunistically if possible. So we need to make sure they are ordered somehow, such that the earliest arriving pending critical allocations are satisfied first, before subsequent critical allocations (possibly waiting for a later GC), or we can get premature OOM situations again, where a thread releases a bunch of memory, expecting to be able to allocate, yet fails due to races with various threads.
>> The queue basically ensures the ordering of critical allocation satisfaction is sound, so that the pending critical allocations with the associated make-or-break GC being the one running purge(), are satisfied first, before satisfying (opportunistically) other critical allocations, that are really waiting for the next GC to happen.
>> 
>> Thanks,
>> 
>> 
>>> Thanks, Thomas
>
> Hi Erik,
> 
> thanks for the extensive explanations!
> 
> One issue with your patch just came to me: the block-on-allocate may be too early. `Metaspace::allocate()` is a bit hot. I wonder about the performance impact of pulling and releasing a lock on each atomar allocation, even if its uncontended. Ideally I'd like to keep this path as close to a simple pointer bump allocation as possible (which it isn't unfortunately).
> 
> It is also not necessary: the majority of callers satisfy their allocation from already-committed arena local memory. So they are good and no thieves. We would block them unnecessary. I estimate only about 1:60 to 1:1000 calls would need that lock.
> 
> Allocation happens (roughly) in these steps:
> 1 try allocate from arena local free block list
> 2 try allocate from arena local current chunk without newly committing memory
> 3 try enlarge the chunk in place and/or commit more chunk memory and allocate from current chunk
> 4 get a new chunk from the freelist
> 
> (1) and (2) don't bother anyone. Hot path is typically (2). From (3) onward concurrently released memory could be used. So (1) and (2) can still happen before your block.
> 
> All that happens inside `MetaspaceArena::allocate`:
> 
> https://github.com/openjdk/jdk/blob/0675473486bc0ee321654d308b600874cf5ce41e/src/hotspot/share/memory/metaspace/metaspaceArena.cpp#L225
> 
> But code-wise (2) and (3) are a bit entangled, so the code would have to be massaged a bit to clearly express (2) from (3).
> 
> Please find more remarks inline.
> 
>> Hi Thomas,
>> 
>> Thanks for chiming in! I will reply inline.
>> 
>> > Hi Erik,
>> > lets see if I understand the problem:
>> > 1 n threads allocate metaspace
>> > 2 thread A gets an allocation error (not HWM but a hard limit)
>> > 3 .. returns, (eventually) schedules a synchronous GC.
>> > 4 .. gc runs, the CLDG is at some point purged, releases metaspace pressure
>> > 5 other threads load classes, allocating metaspace concurrently, benefitting from the released pressure
>> > 6 thread A reattempts allocation, fails again.
>> 
>> That's the one.
>> 
>> > This is normally not a problem, no?
>> 
>> Indeed, and that was my gut feeling when the current handling was written. I wouldn't expect an actual application to ever hit this problem. Nevertheless, I think it's a soundness problem with completely unbounded starvation, even though it doesn't happen in real life. So I think I would still like to fix the issue.
>> 
>> It is definitely a bit arbitrary though where we decide to draw the line of what we guarantee, and what the balance between exactness and maintainability should be. My aim is to try hard enough so we don't rely on luck (random sleeps) if a program that shouldn't be even close to OOM will fail or not, even though you have to be _very_ unlucky for it to fail. But I am not interested in prefect guarantees either, down to the last allocation. My balance is that I allow throwing OOM prematurely if we are "really close" to being down to the last allocation OOM, but if you are not "really close", then no sleep in the world should cause a failure.
> 
> I think I get this now. IIUC we have the problem that memory release happens delayed, and we carry around a baggage of "potentially free" memory which needs a collector run to materialize. So many threads jumping up and down and doing class loading and unloading drive up metaspace use rate and increase the "potentially free" overhead, right? So the thing is to time collector runs right.
> 
> One concern I have is that if the customer runs with too tight a limit, we may bounce from full GC to full GC. Always scraping the barrel enough to just keep going - maybe collecting some short lived loaders - but not enough to get the VM into clear waters. I think this may be an issue today already. What is unclear to me is when it would be just better to give up and throw an OOM. To motivate the customer to increase the limits.
> 
>> 
>> > Which thread exactly gets the OOM if the VM hovers that close to the limit does not really matter. But you write "This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM." - so we could get OOMs even if most of the Metaspace were vacant? This only can happen if, between (4) and (6), other threads not only allocate metaspace, but also then loose the loaders used for those allocations, to late for the GC at (4) to collect them but before (5). Collecting them would require another GC.
>> 
>> Right, and that is indeed what the test does. It loads chunks of 1000 classes and releases them, assuming that surely after releasing them, I can allocate more classes. Unless of course starvation ruins the day.
>> 
>> > In other words, the contract is that we only throw an OOM if we really tried everything, but since the effects of the first GC are "stale" it does not count as try?
>> 
>> The previous contract was that we try to allocate again after a full GC, and if that fails we give up. The issue is that this guarantee allows the GC to free up 99% of metaspace, yet still fail the allocation due to races with other threads doing the same thing. So at any given point, it might be that only 1% of metadata is reachable, yet an OOM can be produced if you are "unlucky".
>> 
>> > Do you think this is a realistic problem?
>> 
>> It is realistic enough that one stress test has failed in the real world. Yet I don't think any application out there will run into any issue. But I prefer having a sound solution where we can know that and not rely on probability.
>> 
>> > Do I understand your patch right in that you divide allocations in two priority classes, add another lock, MetaspaceCritical_lock, which blocks normal allocations as long as critical allocations are queued?
>> 
>> Yes, that's exactly right.
>> 
>> > Sorry if I am slow :)
>> 
>> Not at all!
>> 
>> > One problem I see is that Metaspace::purge is not the full purge. Reclaiming metaspace happens in two stages:
>> > ```
>> > 1. in CLDG::purge, we delete all `ClassLoaderMetaspace` objects belonging to dead loaders. This releases all their metaspace to the freelists, optionally uncommitting portions of it (since JEP387).
>> > 
>> > 2. in Metaspace::purge, we go through Metaspace and munmap any mappings which are now completely vacant.
>> > ```
>> > 
>> > 
>> > The metaspace pressure release already happens in (1), so any concurrent thread allocating will benefit already.
>> 
>> Ah. I thought it was all done in 2. I can move the Big Fat Lock to cover all of CLDG::purge instead. What do you think? It just needs to cover the entire thing basically.
> 
> Why not just cover the whole synchronous GC collect call? I'd put that barrier up as early as possible, to prevent as many threads as possible from entering the more expensive fail path. At that point we know we are near exhaustion. Any thread allocating could just as well wait inside MetaspaceArena::allocate. If the GC succeeds in releasing lots of memory, they will not have been disturbed much.
> 
>> 
>> > Why do we even need a queue? Why could we not just let the first thread attempting a synchronous gc block metaspace allocation path for all threads, including others running into a limit, until the gc is finished and it had its first-allocation-right served?
>> 
>> Each "critical" allocation rides on one particular GC cycle, that denotes the make-or-break point of the allocation.
> 
> I feel like I should know this, but if multiple threads enter satisfy_failed_metadata_allocation around the same time and call a synchronous collect(), they would wait on the same GC, right? They won't start individual GCs for each thread?
> 
>> In order to prevent starvation, we have to satisfy all critical allocations who have their make-or-break GC cycle associated with the current purge() operation, before we release the lock in purge(), letting new allocations in, or we will rely on luck again. However, among the pending critical allocations, they will all possibly have different make-or-break GC cycles associated with them. So in purge() some of them need to be satisfied, and others do not, yet can happily get their allocations satisfied opportunistically if possible. So we need to make sure they are ordered somehow, such that the earliest arriving pending critical allocations are satisfied first, before subsequent critical allocations (possibly waiting for a later GC), or we can get premature OOM situations again, where a thread releases a bunch of memory, expecting to be able to allocate, yet fails due to races with various threads.
>> The queue basically ensures the ordering of critical allocation satisfaction is sound, so that the pending critical allocations with the associated make-or-break GC being the one running purge(), are satisfied first, before satisfying (opportunistically) other critical allocations, that are really waiting for the next GC to happen.
> 
> I still don't get why the order of the critical allocations matter. I understand that even with your barrier, multiple threads can fail the initial allocation, enter the "satisfy_failed_metadata_allocation()" path, and now their allocation count as critical since if they fail again they will throw an OOM. But why does the order of the critical allocations among themselves matter? Why not just let the critical allocations trickle out unordered? Is the relation to the GC cycle not arbitrary?
> 
>> 
>> Thanks,
>> /Erik
>> 
> 
> We need an "erik" pr command :)
> 
> Just FYI, I have very vague plans to extend usage of the metaspace allocator to other areas. To get back the cost of implementation. Eg one candidate may be replacement of the current resource areas, which are just more primitive arena based allocators. This is very vague and not a high priority, but I am a bit interested in keeping the coding generic if its not too much effort. But I don't see your patch causing any problems there.
> 
> Cheers, Thomas

Hi Thomas,

Thanks for having a look at this.

> Hi Erik,
> 
> thanks for the extensive explanations!
> 
> One issue with your patch just came to me: the block-on-allocate may be too early. `Metaspace::allocate()` is a bit hot. I wonder about the performance impact of pulling and releasing a lock on each atomar allocation, even if its uncontended. Ideally I'd like to keep this path as close to a simple pointer bump allocation as possible (which it isn't unfortunately).

I have a global flag that denotes there being at least 1 critical allocation in the system. It is set by the first critical allocation, and cleared by the GC if all critical allocations could be satisfied. The global lock is only taken in Metaspace::allocate() if said flag is on. Normal apps should never see this flag being on. So the only overhead in the common case is to check the flag and see that no locking is required. I think that should be fast enough, right? And when you are in the critical mode, you obviously have more to worry about than lock contention, in terms of how the system is performing.

> It is also not necessary: the majority of callers satisfy their allocation from already-committed arena local memory. So they are good and no thieves. We would block them unnecessary. I estimate only about 1:60 to 1:1000 calls would need that lock.
> 
> Allocation happens (roughly) in these steps:
> 1 try allocate from arena local free block list
> 2 try allocate from arena local current chunk without newly committing memory
> 3 try enlarge the chunk in place and/or commit more chunk memory and allocate from current chunk
> 4 get a new chunk from the freelist
> 
> (1) and (2) don't bother anyone. Hot path is typically (2). From (3) onward concurrently released memory could be used. So (1) and (2) can still happen before your block.
> 
> All that happens inside `MetaspaceArena::allocate`:
> 
> https://github.com/openjdk/jdk/blob/0675473486bc0ee321654d308b600874cf5ce41e/src/hotspot/share/memory/metaspace/metaspaceArena.cpp#L225
> 
> But code-wise (2) and (3) are a bit entangled, so the code would have to be massaged a bit to clearly express (2) from (3).

It might be possible to move it to a less hot path as you propose, but given that the fast path doesn't take said lock unless anyone really has critical allocations in the system, I don't think we need to worry about that. But if you still have performance concerns, I am open to suggestions of course.

> Please find more remarks inline.
> 
> > Hi Thomas,
> > Thanks for chiming in! I will reply inline.
> > > Hi Erik,
> > > lets see if I understand the problem:
> > > 1 n threads allocate metaspace
> > > 2 thread A gets an allocation error (not HWM but a hard limit)
> > > 3 .. returns, (eventually) schedules a synchronous GC.
> > > 4 .. gc runs, the CLDG is at some point purged, releases metaspace pressure
> > > 5 other threads load classes, allocating metaspace concurrently, benefitting from the released pressure
> > > 6 thread A reattempts allocation, fails again.
> > 
> > 
> > That's the one.
> > > This is normally not a problem, no?
> > 
> > 
> > Indeed, and that was my gut feeling when the current handling was written. I wouldn't expect an actual application to ever hit this problem. Nevertheless, I think it's a soundness problem with completely unbounded starvation, even though it doesn't happen in real life. So I think I would still like to fix the issue.
> > It is definitely a bit arbitrary though where we decide to draw the line of what we guarantee, and what the balance between exactness and maintainability should be. My aim is to try hard enough so we don't rely on luck (random sleeps) if a program that shouldn't be even close to OOM will fail or not, even though you have to be _very_ unlucky for it to fail. But I am not interested in prefect guarantees either, down to the last allocation. My balance is that I allow throwing OOM prematurely if we are "really close" to being down to the last allocation OOM, but if you are not "really close", then no sleep in the world should cause a failure.
> 
> I think I get this now. IIUC we have the problem that memory release happens delayed, and we carry around a baggage of "potentially free" memory which needs a collector run to materialize. So many threads jumping up and down and doing class loading and unloading drive up metaspace use rate and increase the "potentially free" overhead, right? So the thing is to time collector runs right.

Exactly.

> One concern I have is that if the customer runs with too tight a limit, we may bounce from full GC to full GC. Always scraping the barrel enough to just keep going - maybe collecting some short lived loaders - but not enough to get the VM into clear waters. I think this may be an issue today already. What is unclear to me is when it would be just better to give up and throw an OOM. To motivate the customer to increase the limits.

Right - this is a bit of a philosophical one. There is always a balance there between precision, code complexity, and when to put the VM out of its misery when it is performing poorly. We deal with the same trade-off really with heap allocations, which is why I am also solving the starvation problem in pretty much the same way: with a queue satisfied by the GC, and locking out starvation. Then we throw OOM in fairly similar conditions. What they have in common is that when they throw, we will have a live portion of metaspace that is "pretty much" full, and there is no point in continuing, while allowing unfortunate timings on a less full (in terms of temporal liveness) metaspace to always succeed.

One might argue that the trade-off should be moved in some direction, and that it is okay for it to be more or less exact, but I was hoping that by doing the same dance that heap OOM situations do, we can at least follow a trade-off that is pretty well established and has worked pretty well for heap OOM situations for many years. And I think heap OOM situations in the wild are far more common than metaspace OOM, so I don't think that the metaspace OOM mechanism needs to do better than what the heap OOM mechanism does. If that makes sense.

> > > Which thread exactly gets the OOM if the VM hovers that close to the limit does not really matter. But you write "This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM." - so we could get OOMs even if most of the Metaspace were vacant? This only can happen if, between (4) and (6), other threads not only allocate metaspace, but also then loose the loaders used for those allocations, to late for the GC at (4) to collect them but before (5). Collecting them would require another GC.
> > 
> > 
> > Right, and that is indeed what the test does. It loads chunks of 1000 classes and releases them, assuming that surely after releasing them, I can allocate more classes. Unless of course starvation ruins the day.
> > > In other words, the contract is that we only throw an OOM if we really tried everything, but since the effects of the first GC are "stale" it does not count as try?
> > 
> > 
> > The previous contract was that we try to allocate again after a full GC, and if that fails we give up. The issue is that this guarantee allows the GC to free up 99% of metaspace, yet still fail the allocation due to races with other threads doing the same thing. So at any given point, it might be that only 1% of metadata is reachable, yet an OOM can be produced if you are "unlucky".
> > > Do you think this is a realistic problem?
> > 
> > 
> > It is realistic enough that one stress test has failed in the real world. Yet I don't think any application out there will run into any issue. But I prefer having a sound solution where we can know that and not rely on probability.
> > > Do I understand your patch right in that you divide allocations in two priority classes, add another lock, MetaspaceCritical_lock, which blocks normal allocations as long as critical allocations are queued?
> > 
> > 
> > Yes, that's exactly right.
> > > Sorry if I am slow :)
> > 
> > 
> > Not at all!
> > > One problem I see is that Metaspace::purge is not the full purge. Reclaiming metaspace happens in two stages:
> > > ```
> > > 1. in CLDG::purge, we delete all `ClassLoaderMetaspace` objects belonging to dead loaders. This releases all their metaspace to the freelists, optionally uncommitting portions of it (since JEP387).
> > > 
> > > 2. in Metaspace::purge, we go through Metaspace and munmap any mappings which are now completely vacant.
> > > ```
> > > 
> > > 
> > > The metaspace pressure release already happens in (1), so any concurrent thread allocating will benefit already.
> > 
> > 
> > Ah. I thought it was all done in 2. I can move the Big Fat Lock to cover all of CLDG::purge instead. What do you think? It just needs to cover the entire thing basically.
> 
> Why not just cover the whole synchronous GC collect call? I'd put that barrier up as early as possible, to prevent as many threads as possible from entering the more expensive fail path. At that point we know we are near exhaustion. Any thread allocating could just as well wait inside MetaspaceArena::allocate. If the GC succeeds in releasing lots of memory, they will not have been disturbed much.

Do you mean 
1) why I don't hold the MetaspaceCritical_lock across the collect() call at the mutator side of the code, or 
2) why I don't hold the MetaspaceCritical_lock across the entire GC cycle instead of purge?

I think you meant 2), so will answer that:
a) Keeping the lock across the entire GC cycle is rather problematic when it comes to constraining lock ranks. It would have to be above any lock ever needed in an entire GC cycle, yet we take a bunch of other locks that mutators hold at different point, interacting with class unloading. It would be very hard to find the right rank for this.
b) The GC goes in and out of safepoints, and needs to sometimes block out safepoints. Holding locks in and out of safepoints while blocking in and out safepoints, is in my experience rather deadlock prone.
c) There is no need to keep the lock across more than the operation that frees metaspace memory, which in a concurrent GC always happens when safepoints are blocked out. If a mutator succeeds during a concurrent GC due to finding memory in a local free list or something, while another allocation failed and needs a critical allocation, then that is absolutely fine, as the successful allocation is never what causes the failing allocation to fail.

> > > Why do we even need a queue? Why could we not just let the first thread attempting a synchronous gc block metaspace allocation path for all threads, including others running into a limit, until the gc is finished and it had its first-allocation-right served?
> > 
> > 
> > Each "critical" allocation rides on one particular GC cycle, that denotes the make-or-break point of the allocation.
> 
> I feel like I should know this, but if multiple threads enter satisfy_failed_metadata_allocation around the same time and call a synchronous collect(), they would wait on the same GC, right? They won't start individual GCs for each thread?

The rule we have in ZGC to ensure what we want in terms of OOM situations, is that GCs that are *requested* before an equivalent GC *starts*, can be satisfied with the same GC cycle.

Let's go through what will likely happen in practice with my solution when you have, let's say 1000 concurrent calls to satisfy a failing metaspace allocation.

1. Thread 1 registers its allocation, sees it is the first one, and starts a metaspace GC.

2. GC starts running

3. Threads 2-999 register their allocations, and see that there was already a critical allocation before them. This causes them to wait for the GC to purge, opportunistically, riding on that first GC.

4. The GC satisfies allocations. For the sake of example, let's say that allocations 1-500 could be satisfied, but not the rest.

5. Threads 2-500 who were waiting for purge to finish, wake up, and run off happily with their satisfied allocations.

6. Threads 501-1000 wake up seeing that their allocations didn't get satisfied. They now stop being opportunistic, and request a gc each before finally giving up.

7. The first GC cycle finishes.

8. Thread 1 wakes up after the entire first GC cycle is done and sees its satisfied allocation, happily running off with it.

9. The next GC cycle starts

10. The next GC cycle successfully satisfies metadata allocations for threads 501-750, but not the rest.

11. The entire next GC cycle finishes, satisfying the GC requested by threads 2-1000, as they all *requested* a metaspace GC, *before* it started running. Therefore, no further GC will run.

12. Threads 501-750 wake up, happily running off with their satisfied allocations.

13. Threads 751-1000 wake up, grumpy about the lack of memory after their GC. They are all gonna throw.

So basically, if they can all get satisfied with 1 GC, then 1 GC will be enough. But we won't throw until a thread has had a full GC run *after* it was requested, but multiple threads can ride on the same GC there. In this example, threads 2-1000 all ride on the same GC.

Note though, that we would never allow a GC that is already running to satisfy a GC request that comes in while the GC is already running, as we then wouldn't catch situations when a thread releases a lot of memory, and then expects it to be available just after.

> > In order to prevent starvation, we have to satisfy all critical allocations who have their make-or-break GC cycle associated with the current purge() operation, before we release the lock in purge(), letting new allocations in, or we will rely on luck again. However, among the pending critical allocations, they will all possibly have different make-or-break GC cycles associated with them. So in purge() some of them need to be satisfied, and others do not, yet can happily get their allocations satisfied opportunistically if possible. So we need to make sure they are ordered somehow, such that the earliest arriving pending critical allocations are satisfied first, before subsequent critical allocations (possibly waiting for a later GC), or we can get premature OOM situations again, where a thread releases a bunch of memory, expecting to be able to allocate, yet fails due to races with various threads.
> > The queue basically ensures the ordering of critical allocation satisfaction is sound, so that the pending critical allocations with the associated make-or-break GC being the one running purge(), are satisfied first, before satisfying (opportunistically) other critical allocations, that are really waiting for the next GC to happen.
> 
> I still don't get why the order of the critical allocations matter. I understand that even with your barrier, multiple threads can fail the initial allocation, enter the "satisfy_failed_metadata_allocation()" path, and now their allocation count as critical since if they fail again they will throw an OOM. But why does the order of the critical allocations among themselves matter? Why not just let the critical allocations trickle out unordered? Is the relation to the GC cycle not arbitrary?

If we don't preserve the order, we would miss situations when a thread releases a large chunk of metaspace (e.g. by releasing a class loader reference), and then expects that memory to be available. An allocation from a critical allocation that is associated with a subsequent GC, could starve a thread that is associated with the current GC cycle, hence causing a premature OOM for that thread, while not really needing that allocation until next GC cycle, while doing it in the right order would satisfy both allocations.

One might argue it might be okay to throw in such a scenario with an unordered solution. We are pretty close to running out of memory anyway I guess. But I'm not really sure what such a solution would look like in more detail, and thought writing this little list was easy enough, and for me easier to reason about, partially because we do the same dance in the GC to deal with heap OOM, which has been a success.

Thanks,

^--- I know this will be treated as a PR bot command, but I can't give up on my slash!

> > Thanks,
> > /Erik
> 
> We need an "erik" pr command :)

Indeed.

> Just FYI, I have very vague plans to extend usage of the metaspace allocator to other areas. To get back the cost of implementation. Eg one candidate may be replacement of the current resource areas, which are just more primitive arena based allocators. This is very vague and not a high priority, but I am a bit interested in keeping the coding generic if its not too much effort. But I don't see your patch causing any problems there.

That sounds very interesting. We had internal discussions about riding on the MetaspaceExpand lock which I believe would also work, but thought this really ought to be a separate thing so that we don't tie ourselves too tight to the internal implementation of the allocator. Given that you might want to use the allocator for other stuff, it sounds like that is indeed the right choice.

> Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/2289