MemorySegment JVM memory leak
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Apr 23 13:22:54 UTC 2020
On 23/04/2020 13:32, Uwe Schindler wrote:
> Hi,
>
>> Thinking more about this, I think I stand by the existing API.
> That's fine. The little bit of "thread hacks" is easy for us as a first step. I can work on testing it with some live indexes. The only thing that would really be helpful - as noted before - should be to add offset to the MemorySegment::mapFromPath(). This would make it symmetric to what you can do with ByteBuffers.
Cool - glad that there's a path for Lucene to play more with this API!
>
>> Short-term is going to be a little inconvenient, yes, because to
>> map/unmap files you basically need to go native (or to call close() on
>> the original segment and accept some thread dances)
> That's my initital plan for a mockup until native APIs can be called with ABI integration.
>
>> But not so long-term there's gonna be the next chapter in the story -
>> the ABI integration - which will let you create native method handles;
>> that is method handles targeting native library functions. Once this
>> capability comes in (and you can already try it out in the Panama repo),
>> it is relatively easy to map/unmap a file directly, w/o using JNI code
>> or any kind of external dll/so lib.
> That's something where Elasticsearch is already waiting for. It currently uses JNA for locking pages or install a seccomp syscall filter on startup (https://www.elastic.co/de/blog/seccomp-in-the-elastic-stack, https://fosdem.org/2020/schedule/event/security_seccomp/attachments/slides/3881/export/events/attachments/security_seccomp/slides/3881/seccomp.pdf)
That makes sense - then I hope that what we have in the pipeline will be
useful in the long run ;-)
>
>> For instance you could create a truly unsafe segment w/o even using the
>> standard MemorySegment::mapFromFile; below is a simple Gist which shows
>> how you can do something like that using the ABI support:
>>
>> https://gist.github.com/mcimadamore/128ee904157bb6c729a10596e69edffd
>>
>> (it is less than 100 LoC - half of it is the test logic).
> That's indeed cool. The actual implementation will be a bit more hard, as you need to add a Windows part, too. But in general one could live with that.
Yep - Windows needs to be added too - but on this point, while writing
this mock up I realized how much mapped buffers were already in the
realm of "system programming". It is probably going to be hard for a
general purpose Java API to extract every ounce of performance out of
memory mapped files, if it cannot make assumption about what primitives
are available underneath. With this approach you can maybe many layers -
a general purpose one (to just create a memory mapped segment) plus
ad-hoc platform-dependent layers which give you access to more
functionalities (which you can then use to speed up things).
>
> This would also help to add fadvise() for standard iostream-based APIs like when you copy files (sse my other mail). Sometimes we are reading files to just copy/transform the contents one time and they get deleted afterwards. Kicking all already cached pages out of the FS cache, just because you load a huge file where you know that you never read it again is really something you should not do. So we can improve that, too. Although for some use-cases it would be cool if the standard NIO File API sjpuld provide ways to handle file caching, too. Because as soon as you change to native APIs, you have to also implement your own InputStreams (as the underlying file descriptor is unreachable). I hope Panama will allow to call methods and take a java.io.FileDescriptor or a Path object and it is somehow magically converted to a integer file descriptor.
MethodHandles can be adapted - perhaps in the future we could add some
ad-hoc MH adapters to go from int to FileDescriptor and back; these will
probably be platform-dependent adapters too.
>
>> I think that, for power-users like you, this way of doing things is
>> probably more direct than bending MappedByteBuffer, or
>> MappedMemorySegment exactly the way you want it. At some point you are
>> gonna need some extra customization (you mentioned about
>> MADV_DONTNEED,
>> other people mentioned MADV_REMOVE) which might make difference in your
>> case; while in some cases some additional API points will be added to
>> the JDK, we can't expect the JDK to support all possible ways in which a
>> client might wish to interact with memory mapped files. But with custom
>> memory segments + ABI support you don't need to wait on the JDK to give
>> you the knobs you want - you can just reach for them directly.
> I agree with that.
>
>> I think that's a much saner way to get things done - in a way, the whole
>> mappedXYZ business is a big workaround for the fact that we have no
>> other ways in Java to reason about memory mapped files (which are
>> useful!) but their behavior is ultimately platform-dependent, hence some
>> of the APIs in MappedByteBuffer and MappedMemorySegment are "best
>> effort" or simply do nothing on certain platforms (e.g. Windows).
> MappedByteBuffer works very well on Windows. The only problems are the usual shit like you can't delete the file while a byte buffer is still alive!
>
>> So, maybe what you need, ultimately, is your own custom segment factory.
>> All still written in Java - but in a "different kind" of Java.
> +1
Seems like we are on the same page - thanks a lot for the feedback!
Maurizio
>
>> Maurizio
> Thanks,
> Uwe
>
More information about the panama-dev
mailing list