RFR 8243491: Implementation of Foreign-Memory Access API (Second Incubator)

Wed Apr 29 00:41:46 UTC 2020

On 28/04/2020 23:08, Peter Levart wrote:
>
>
> On 4/28/20 10:49 PM, Maurizio Cimadamore wrote:
>>
>> On 28/04/2020 17:12, Peter Levart wrote:
>>> Hi Maurizio,
>>>
>>> I'm checking out the thread-confinement in the parallel stream case. 
>>> I see the Spliterator.trySplit() is calling 
>>> AbstractMemorySegmentImpl's:
>>>
>>>  102     private AbstractMemorySegmentImpl asSliceNoCheck(long 
>>> offset, long newSize) {
>>>  103         return dup(offset, newSize, mask, owner, scope);
>>>  104     }
>>>
>>> ...so here the "owner" of the slice is still the same as that of 
>>> parent segment...
>>>
>>> But then later in tryAdvance or forEachRemaining, the segment is 
>>> acquired/closed for each element of the stream (in case of 
>>> tryAdvance) or for the whole chunk to the end of spliterator (in 
>>> case of forEachRemaining). So some pipelines will be more optimal 
>>> than others...
>>
>> Not sure I follow here - you have to create a new segment for each 
>> element of the stream since you don't know what thread is gonna 
>> process it anyway no?
>
> When forEachRemaining is called for all remaining elements, you only 
> have to acquire one child scope, since your loop will process all 
> elements in the same thread. You do create slices for each element, 
> but slices don't acquire new child scope and close it afterwards. But 
> when tryAdvance is called for each element from the pipeline, you have 
> to acquire new scope and close it after each element. And this is the 
> point of contention since multiple threads will be doing the same 
> concurrently... So depending on whether the pipeline might shortcut 
> the stream or not, execution could be more optimal or less, maybe to 
> the point where contention is so large that it is prohibitive.

OK - see my other reply

Maurizio

>
> Peter
>
>>
>> Maurizio
>>
>>>
>>> So I'm thinking. Would it be possible to "lazily" acquire scope just 
>>> once in tryAdvance and then re-use the scope until the end? 
>>> Unfortunately Spliterator does not have a close() method to be 
>>> called when the pipeline is done with it. Perhaps it could be added 
>>> to the API? This is not the 1st time I wished Spliterator had a 
>>> close method. I had a similar problem when trying to create a 
>>> Spliterator with a database backend. When using JDBC API a separate 
>>> transaction (Connection) is typically required for each thread of 
>>> execution since several frameworks bind it to the ThreadLocal.
>>>
>>> WDYT?
>>>
>>> Regards, Peter
>>>
>>>
>>> On 4/23/20 10:33 PM, Maurizio Cimadamore wrote:
>>>> Hi,
>>>> time has come for another round of foreign memory access API 
>>>> incubation (see JEP 383 [3]). This iteration aims at polishing some 
>>>> of the rough edges of the API, and adds some of the functionalities 
>>>> that developers have been asking for during this first round of 
>>>> incubation. The revised API tightens the thread-confinement 
>>>> constraints (by removing the MemorySegment::acquire method) and 
>>>> instead provides more targeted support for parallel computation via 
>>>> a segment spliterator. The API also adds a way to create a custom 
>>>> native segment; this is, essentially, an unsafe API point, very 
>>>> similar in spirit to the JNI NewDirectByteBuffer functionality [1]. 
>>>> By using this bit of API,  power-users will be able to add support, 
>>>> via MemorySegment, to *their own memory sources* (e.g. think of a 
>>>> custom allocator written in C/C++). For now, this API point is 
>>>> called off as "restricted" and a special read-only JDK property 
>>>> will have to be set on the command line for calls to this method to 
>>>> succeed. We are aware there's no precedent for something like this 
>>>> in the Java SE API - but if Project Panama is to remain true about 
>>>> its ultimate goal of replacing bits of JNI code with (low level) 
>>>> Java code, stuff like this has to be *possible*. We anticipate 
>>>> that, at some point, this property will become a true launcher 
>>>> flag, and that the foreign restricted machinery will be integrated 
>>>> more neatly into the module system.
>>>>
>>>> A list of the API, implementation and test changes is provided 
>>>> below. If you have any questions, or need more detailed 
>>>> explanations, I (and the rest of the Panama team) will be happy to 
>>>> point at existing discussions, and/or to provide the feedback 
>>>> required.
>>>>
>>>> Thanks
>>>> Maurizio
>>>>
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~mcimadamore/8243491_v1/webrev
>>>>
>>>> Javadoc:
>>>>
>>>> http://cr.openjdk.java.net/~mcimadamore/8243491_v1/javadoc
>>>>
>>>> Specdiff:
>>>>
>>>> http://cr.openjdk.java.net/~mcimadamore/8243491_v1/specdiff/overview-summary.html 
>>>>
>>>>
>>>> CSR:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8243496
>>>>
>>>>
>>>>
>>>> API changes
>>>> ===========
>>>>
>>>> * MemorySegment
>>>>   - drop support for acquire() method - in its place now you can 
>>>> obtain a spliterator from a segment, which supports divide-and-conquer
>>>>   - revamped support for views - e.g. isReadOnly - now segments 
>>>> have access modes
>>>>   - added API to do serial confinement hand-off 
>>>> (MemorySegment::withOwnerThread)
>>>>   - added unsafe factory to construct a native segment out of an 
>>>> existing address; this API is "restricted" and only available if 
>>>> the program is executed using the -Dforeign.unsafe=permit flag.
>>>>   - the MemorySegment::mapFromPath now returns a MappedMemorySegment
>>>> * MappedMemorySegment
>>>>   - small sub-interface which provides extra capabilities for 
>>>> mapped segments (load(), unload() and force())
>>>> * MemoryAddress
>>>>   - added distinction between *checked* and *unchecked* addresses; 
>>>> *unchecked* addresses do not have a segment, so they cannot be 
>>>> dereferenced
>>>>   - added NULL memory address (it's an unchecked address)
>>>>   - added factory to construct MemoryAddress from long value 
>>>> (result is also an unchecked address)
>>>>   - added API point to get raw address value (where possible - e.g. 
>>>> if this is not an address pointing to a heap segment)
>>>> * MemoryLayout
>>>>   - Added support for layout "attributes" - e.g. store metadata 
>>>> inside MemoryLayouts
>>>>   - Added MemoryLayout::isPadding predicate
>>>>   - Added helper function to SequenceLayout to rehape/flatten 
>>>> sequence layouts (a la NDArray [4])
>>>> * MemoryHandles
>>>>   - add support for general VarHandle combinators (similar to MH 
>>>> combinators)
>>>>   - add a combinator to turn a long-VH into a MemoryAddress VH (the 
>>>> resulting MemoryAddress is also *unchecked* and cannot be 
>>>> dereferenced)
>>>>
>>>> Implementation changes
>>>> ======================
>>>>
>>>> * add support for VarHandle combinators (e.g. IndirectVH)
>>>>
>>>> The idea here is simple: a VarHandle can almost be thought of as a 
>>>> set of method handles (one for each access mode supported by the 
>>>> var handle) that are lazily linked. This gives us a relatively 
>>>> simple idea upon which to build support for custom var handle 
>>>> adapters: we could create a VarHandle by passing an existing var 
>>>> handle and also specify the set of adaptations that should be 
>>>> applied to the method handle for a given access mode in the 
>>>> original var handle. The result is a new VarHandle which might 
>>>> support a different carrier type and more, or less coordinate 
>>>> types. Adding this support was relatively easy - and it only 
>>>> required one low-level surgery of the lambda forms generated for 
>>>> adapted var handle (this is required so that the "right" var handle 
>>>> receiver can be used for dispatching the access mode call).
>>>>
>>>> All the new adapters in the MemoryHandles API (which are really 
>>>> defined inside VarHandles) are really just a bunch of MH adapters 
>>>> that are stitched together into a brand new VH. The only caveat is 
>>>> that, we could have a checked exception mismatch: the VarHandle API 
>>>> methods are specified not to throw any checked exception, whereas 
>>>> method handles can throw any throwable. This means that, 
>>>> potentially, calling get() on an adapted VarHandle could result in 
>>>> a checked exception being thrown; to solve this gnarly issue, we 
>>>> decided to scan all the filter functions passed to the VH 
>>>> combinators and look for direct method handles which throw checked 
>>>> exceptions. If such MHs are found (these can be deeply nested, 
>>>> since the MHs can be adapted on their own), adaptation of the 
>>>> target VH fails fast.
>>>>
>>>>
>>>> * More ByteBuffer implementation changes
>>>>
>>>> Some more changes to ByteBuffer support were necessary here. First, 
>>>> we have added support for retrieval of "mapped" properties 
>>>> associated with a ByteBuffer (e.g. the file descriptor, etc.). This 
>>>> is crucial if we want to be able to turn an existing byte buffer 
>>>> into the "right kind" of memory segment.
>>>>
>>>> Conversely, we also have to allow creation of mapped byte buffers 
>>>> given existing parameters - which is needed when going from 
>>>> (mapped) segment to a buffer. These two pieces together allow us to 
>>>> go from segment to buffer and back w/o losing any information about 
>>>> the underlying memory mapping (which was an issue in the previous 
>>>> implementation).
>>>>
>>>> Lastly, to support the new MappedMemorySegment abstraction, all the 
>>>> memory mapped supporting functionalities have been moved into a 
>>>> common helper class so that MappedMemorySegmentImpl can reuse that 
>>>> (e.g. for MappedMemorySegment::force).
>>>>
>>>> * Rewritten memory segment hierarchy
>>>>
>>>> The old implementation had a monomorphic memory segment class. In 
>>>> this round we aimed at splitting the various implementation classes 
>>>> so that we have a class for heap segments (HeapMemorySegmentImpl), 
>>>> one for native segments (NativeMemorySegmentImpl) and one for 
>>>> memory mapped segments (MappedMemorySegmentImpl, which extends from 
>>>> NativeMemorySegmentImpl). Not much to see here - although one 
>>>> important point is that, by doing this, we have been able to speed 
>>>> up performances quite a bit, since now e.g. native/mapped segments 
>>>> are _guaranteed_ to have a null "base". We have also done few 
>>>> tricks to make sure that the "base" accessor for heap segment is 
>>>> sharply typed and also NPE checked, which allows C2 to speculate 
>>>> more and hoist. With these changes _all_ segment types have 
>>>> comparable performances and hoisting guarantees (unlike in the old 
>>>> implementation).
>>>>
>>>> * Add workarounds in MemoryAddressProxy, AbstractMemorySegmentImpl 
>>>> to special case "small segments" so that VM can apply bound check 
>>>> elimination
>>>>
>>>> This is another important piece which allows to get very good 
>>>> performances out of indexes memory access var handles; as you might 
>>>> know, the JIT compiler has troubles in optimizing loops where the 
>>>> loop variable is a long [2]. To make up for that, in this round we 
>>>> add an optimization which allows the API to detect whether a 
>>>> segment is *small* or *large*. For small segments, the API realizes 
>>>> that there's no need to perform long computation (e.g. to perform 
>>>> bound checks, or offset additions), so it falls back to integer 
>>>> logic, which in turns allows bound check elimination.
>>>>
>>>> * renaming of the various var handle classes to conform to "memory 
>>>> access var handle" terminology
>>>>
>>>> This is mostly stylistic, nothing to see here.
>>>>
>>>> Tests changes
>>>> =============
>>>>
>>>> In addition to the tests for the new API changes, we've also added 
>>>> some stress tests for var handle combinators - e.g. there's a flag 
>>>> that can be enabled which turns on some "dummy" var handle 
>>>> adaptations on all var handles created by the runtime. We've used 
>>>> this flag on existing tests to make sure that things work as expected.
>>>>
>>>> To sanity test the new memory segment spliterator, we have wired 
>>>> the new segment spliterator with the existing spliterator test 
>>>> harness.
>>>>
>>>> We have also added several micro benchmarks for the memory segment 
>>>> API (and made some changes to the build script so that native 
>>>> libraries would be handled correctly).
>>>>
>>>>
>>>> [1] - 
>>>> https://docs.oracle.com/en/java/javase/14/docs/specs/jni/functions.html#newdirectbytebuffer
>>>> [2] - https://bugs.openjdk.java.net/browse/JDK-8223051
>>>> [3] - https://openjdk.java.net/jeps/383
>>>> [4] - 
>>>> https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html#numpy.reshape
>>>>
>>>>
>>>
>