[External] : Re: MemorySegment.ofAddress(...).reinterpret(...)

Brian S O'Neill bronee at gmail.com
Thu Jul 6 18:45:15 UTC 2023


Thanks for looking into this. I'm already doing unaligned accesses, so 
there's not much I can do at this point on my end. I'll try to identify 
which operations are the most expensive and report back.

On 2023-07-06 11:31 AM, Jorn Vernee wrote:
> I took a look at the generated assembly [1] for the payload method in:
> 
>      static final int ELEM_SIZE = 10;
>      static final int CARRIER_SIZE = (int)JAVA_INT.byteSize();
>      static final int BYTE_SIZE = ELEM_SIZE * CARRIER_SIZE;
>      static final MemorySegment ALL = 
> MemorySegment.NULL.reinterpret(Long.MAX_VALUE);
> 
>      public static void main(String[] args) {
>          int[] ints = new int[ELEM_SIZE];
>          long unsafe_addr = Arena.ofAuto().allocate(BYTE_SIZE).address();
>          for (int i = 0; i < ints.length ; i++) {
>              ints[i] = i;
>          }
> 
>          State state = new State(ints, unsafe_addr);
>          for (int i = 0; i < 20_000; i++) {
>              payload(state);
>          }
>      }
> 
>      record State(int[] ints, long unsafe_addr) {}
> 
>      public static void payload(State state) {
>          MemorySegment.copy(state.ints(), 0, ALL, JAVA_INT, 
> state.unsafe_addr(), ELEM_SIZE);
>      }
> 
> There are three checks we do to ensure safety:
> 1. a bounds check, which checks if the accessed address range is in bounds
> 2. an alignment check to see that the base address is aligned according 
> to the alignment of JAVA_INT
> 3. a liveness check to see that the segment being accessed is still alive
> 
> #2 can be removed on your end by using the JAVA_INT_UNALIGNED layout 
> instead. #3 shouldn't really be happening, since the scope of the ALL 
> segment is the global scope, which is always alive. I have a fix that 
> overrides the liveness check method in the GlobalSession class to do 
> nothing [2]. On my machine, addressing those 2 brings the performance of 
> the Panama-based copy very close to that of Unsafe (within 0.2 ns):
> 
> Benchmark Mode  Cnt  Score   Error  Units
> MemorySegmentCopy.segment_copy_static_small  avgt   30  7.741 ± 0.017  
> ns/op
> MemorySegmentCopy.unsafe_copy_small          avgt   30  7.534 ± 0.011  
> ns/op
> 
> Also, keep in mind that I'm just copying 10 elements here, for larger 
> copies the overhead would be smaller.
> 
> #1 could technically also be removed by exposing a special memory 
> segment implementation that doesn't do any bounds checking. I'm not sure 
> we want to go there though, given the API surface/complexity we'd 
> introduce, for a very marginal gain. At least I'd like to discuss that 
> with Maurizio first (who's currently on vacation).
> 
> #2 is definitely something we can address I think. I've filed: 
> https://bugs.openjdk.org/browse/JDK-8311594
> 
> HTH,
> Jorn
> 


More information about the panama-dev mailing list