memorySegment.asByteBuffer().duplicate() causes multi-MB allocations

Sebastian Stenzel sebastian.stenzel at gmail.com
Wed Oct 26 10:37:23 UTC 2022


> On 26. Oct 2022, at 12:17, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> 
> On 26/10/2022 10:56, Sebastian Stenzel wrote:
> 
>> Ok, let's consider this a measurement error for now. I will keep an eye on it and report if I get a reliably reproducible example. So hopefully never 😉
> Heh.
> 
> The strangest thing is if you noticed a difference between 18 and 19. The impl code has not changed at all, so I can't really explain how 19 would generate a spike in allocations.
> 
I only ever ran it on 19, so I have no reason to assume that there is any difference between 18 and 19.
> Also, as we discussed yesterday, the `duplicate` method is really just creating a dumb view over existing memory, so to get to 740Mb only with these views you'll need lots of objects :-)
> 
> And... I also believe that the memory view you are looking at is just looking at "shallow" size of an object. Which means you really have 740Mb only for the DirectByteBuffer instances, _excluding_ any on-heap or off-heap data they might refer to.
> 
> This seems...... excessive.
> 
> But, I can see the stack trace now, and it does point at DirectByteBuffer::duplicate - which makes sense: this method *is* creating a new view. But, JMC says there's 72 calls to `duplicate`, and 740MB of memory. 10Mb per direct buffer doesn't make any sense (considering this is shallow).
> 
> Another strange thing is that if you click on DirectByteBuffer, in "before" we see a massive spike at the beginning, then nothing. In "after" we see lots of allocation spread throughout the applications. These allocations come from MemorySegment::asByteBuffer, so, a different place.
> 
There is is an expected amount of allocations all the time, both "before" and "after", due to persistent invocations of the benchmarked method. However due to different scaling of the "before" chart it _appears_ like nothing after the first spike. When you zoom in on the time axis, so that the spike is out of view, you'll see similar patterns "before" and "after".
> My best explanation is that this method is invoked a lot, and all the buffer views created are short-lived, so they do not impact GC activity in any significant way. After a while, better inlining kicks in, and allocations are removed entirely, as DirectByteBuffer::duplicate is inlined and its result likely scalarized.
> 
> All this seems very very strange - almost like you are comparing different runs doing different things. Or, perhaps, some measurement error.
> 
> Maurizio
> 
> 
> 
>> 
>>> On 26. Oct 2022, at 11:51, Maurizio Cimadamore <maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>> 
>>> Ok, my JMC was out of date. Now I see what you see :-)
>>> 
>>> It is indeed odd - and also I note that all the allocations occur at the start of the recording. I wonder if maybe these are just direct buffer instances that have been created by some other bits of the JDK (several buffers are created at initialization time).
>>> 
>>> Without the ability to look at where the allocations came from it is a bit hard to guess. The fact that you are having troubles reproducing it makes me think that you got lucky/unlucky and the profiler happened to stumble on some burst direct buffer allocation, presumably at JVM startup.
>>> 
>>> Maybe a good experiment would be to lower the polling granularity, and see if you can spot the allocation on all the runs?
>>> 
>>> Maurizio
>>> 
>>> 
>>> 
>>> On 26/10/2022 10:18, Maurizio Cimadamore wrote:
>>>> Uhm I don't have all that data that you seem to have in your image :-(
>>>> 
>>>> My memory view looks empty. There is no allocation, and if I explicitly go and look I see 6-7 allocations in total :-)
>>>> 
>>>> Is the recording you shared the same you are looking at?
>>>> 
>>>> Maurizio
>>>> 
>>>> On 26/10/2022 10:11, Sebastian Stenzel wrote:
>>>>> Thanks for taking a look.
>>>>> 
>>>>> Please note that I'm not looking for a memory leak here: The objects are short-living (within the scope of a method) and no two instances exist at the same time, hence no increase in heap size.
>>>>> 
>>>>> Instead, what got me wondering was the mismatch between the amount of memory allocated for a rather small number of objects (740 MiB for 141 DirectByteBuffers). At least the record only contains these 72 allocations via `ByteBuffer.duplicate()` + 69 `NativeMemorySegmentImpl.makeByteBuffer()`. I'm looking at this chart: https://static.cryptomator.org/tmp/before890035c.png <https://urldefense.com/v3/__https://static.cryptomator.org/tmp/before890035c.png__;!!ACWV5N9M2RV99hQ!JckaBSJm5So1bCqFKc-DKIJwby_3IL0NrwJF28uq06RV8KF5cODe9zClnOFenKodB7r-jU1KwoEwrH3lNJ9ee622dbMqw0eIuQ$>
>>>>> 
>>>>> I have very limited experience with JFR and JMC and don't know if it is profiling of sampling allocations. But maybe the events are incomplete and I'm in fact dealing with thousands of instances?
>>>>> 
>>>>> 
>>>>>> On 26. Oct 2022, at 10:53, Maurizio Cimadamore <maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>>> 
>>>>>> Looking at this recordings with JMC is inconclusive.
>>>>>> 
>>>>>> All I see is that your app seems to be using up to 20MB heap, which seems rather low.
>>>>>> 
>>>>>> The number of live objects, at least according to the tool is a handful (< 10).
>>>>>> 
>>>>>> I don't see anything here indicating a problem? (but maybe I'm looking in the wrong place?)
>>>>>> 
>>>>>> Maurizio
>>>>>> 
>>>>>> On 25/10/2022 15:15, Sebastian Stenzel wrote:
>>>>>>> Ok so here are the recordings:
>>>>>>> 
>>>>>>> with duplicate(): https://static.cryptomator.org/tmp/before890035c.jfr <https://urldefense.com/v3/__https://static.cryptomator.org/tmp/before890035c.jfr__;!!ACWV5N9M2RV99hQ!JDzHI5oiHSZJaLRKXbpqU6kqSKHSD0yZEIW8lQT4wRJq0M-_iS6Kz59xcSpcX_sMupZlUpzmv9uLpTNulFGUEIRVe93lmH_Uzw$>
>>>>>>> without: https://static.cryptomator.org/tmp/after890035c.jfr <https://urldefense.com/v3/__https://static.cryptomator.org/tmp/after890035c.jfr__;!!ACWV5N9M2RV99hQ!JDzHI5oiHSZJaLRKXbpqU6kqSKHSD0yZEIW8lQT4wRJq0M-_iS6Kz59xcSpcX_sMupZlUpzmv9uLpTNulFGUEIRVe93JaaOc7g$>
>>>>>>> 
>>>>>>> This is the diff in code: https://github.com/cryptomator/jfuse/commit/890035c702489cdb2e094a1fc767e75b15b04891 <https://urldefense.com/v3/__https://github.com/cryptomator/jfuse/commit/890035c702489cdb2e094a1fc767e75b15b04891__;!!ACWV5N9M2RV99hQ!JDzHI5oiHSZJaLRKXbpqU6kqSKHSD0yZEIW8lQT4wRJq0M-_iS6Kz59xcSpcX_sMupZlUpzmv9uLpTNulFGUEIRVe92yJr55Hg$>
>>>>>>> 
>>>>>>> I was testing the `read` method in this class. `write` (with its `.asReadOnlyBuffer()`) did not draw my attention.
>>>>>>> 
>>>>>>>> On 25. Oct 2022, at 15:03, Maurizio Cimadamore <maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>>>>> 
>>>>>>>> No worries.
>>>>>>>> 
>>>>>>>> I guess having a look at jfr file would be useful. But note that I don't think you can send that to the mailing list - AFAIK, attachments are dropped.
>>>>>>>> 
>>>>>>>> Also, forgot to ask - is this something for which you saw a difference between 18 and 19? Because that would be odd too.
>>>>>>>> 
>>>>>>>> Maurizio
>>>>>>>> 
>>>>>>>> On 25/10/2022 13:26, Sebastian Stenzel wrote:
>>>>>>>>> I can't reproduce it any longer. 😖
>>>>>>>>> 
>>>>>>>>> I swear I would not have consulted the mailing list, if the observation had not been that spooky. Especially regarding said discrepency between `asReadOnlyBuffer()` and `duplicate()`.
>>>>>>>>> 
>>>>>>>>> I still have the .jfr file, though. Is this of any use (other than proving I'm not seeing ghosts)?
>>>>>>>>> 
>>>>>>>>>> On 25. Oct 2022, at 13:15, Maurizio Cimadamore <maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 25/10/2022 12:04, Maurizio Cimadamore wrote:
>>>>>>>>>>> ```
>>>>>>>>>>> public MappedByteBuffer duplicate() {
>>>>>>>>>>>         return new DirectByteBuffer(this,
>>>>>>>>>>>                                               this.markValue(),
>>>>>>>>>>>                                               this.position(),
>>>>>>>>>>>                                               this.limit(),
>>>>>>>>>>>                                               this.capacity(),
>>>>>>>>>>>                                               0,
>>>>>>>>>>> 
>>>>>>>>>>>                                               fileDescriptor(),
>>>>>>>>>>>                                               isSync(),
>>>>>>>>>>> 
>>>>>>>>>>>                                               segment);
>>>>>>>>>>>     }
>>>>>>>>>>> ```
>>>>>>>>>> Btw, the implementation of `adReadOnlyBuffer` is virtually identical to the above - the only difference is that DirectBuffer::asReadOnlyBuffer creates a new DIrectBufferR (where R stands for Read Only), not just another DirectBuffer. On read-only direct buffers (DirectBufferR), asReadOnly buffer just calls duplicate().
>>>>>>>>>> 
>>>>>>>>>> So, I think it's very strange that one one of them misbehave, since the implementation is so close.
>>>>>>>>>> 
>>>>>>>>>> What do you use to monitor the allocations? One theory is that perhaps one version of the code "escapes" better than the other, so the numbers you see are skewed, and in some cases you see the actual BB allocations on the heap, whereas in the other case you don't see anything because the JVM scalarizes the BB views. But again, the `duplicate` and `adreadOnlyBuffer` are so close to each other that it's hard to imagine meaningful differences in terms of escape analysis (even though escape analsyis can sometimes act in surprising ways).
>>>>>>>>>> 
>>>>>>>>>> Maurizio
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20221026/0bb395b1/attachment-0001.htm>


More information about the panama-dev mailing list