RFC (round 1), JEP draft: Low-level Object layout introspection methods

Wed Aug 19 12:10:36 UTC 2020

Now all these APIs that return distances (except maybe 
mangledAddressesOf) don't give any information about alignment to power 
of 2 address multiples (cache lines for example). So if we modify the 
last (4th) API variant for distances to also return alignments:

long[] estimatedDistancesAndAlignmentsInBytes(ObjectField[] 
objectFields, int bits); // 0 < bits <= 2^16 for example to not expose 
too much of real addresses

where r[2*i] of the returned array represents a distance: 
realAddressOf(f[i]) - realAddressOf(f[0])
and r[2*i+1] of the returned array represents an alignment: 
realAddressOf(f[i]) & ((1L << bits) - 1)

...then I think everything needed for performance analysis can be 
derived from this information.

Regards, Peter

On 8/19/20 1:46 PM, Peter Levart wrote:
>
>
> On 8/19/20 1:35 PM, Peter Levart wrote:
>> Hi,
>>
>> On 8/17/20 5:13 PM, Erik Österlund wrote:
>>> Perhaps, another way of answering the same question without the 
>>> addressOf API, is to have an estimatedDistanceInBytes(Object o1, 
>>> Object o2) or even an
>>> estimatedDistanceInBytes(Object o1, Field f1, Object o2, Field f2) 
>>> API. This API could run in a mode where there are no
>>> safepoints, and ensure that none of the above mentioned "impossible" 
>>> situations actually remain impossible, and hence more effectively 
>>> actually answering the high-level question.
>>> It would also importantly never expose any addresses or offsets, 
>>> while still allowing various locality heuristics to be computed by 
>>> performance people. 
>>
>> ...still, if you wanted to obtain the estimated distances among all 
>> pairs of (object, field)-s in a set of objects, individually 
>> query-ing for each pair would not give you a snapshot view for the 
>> whole set. If this is important, one would perhaps have to have an 
>> API like this:
>>
>> record ObjectField(Object object, Field field) {}
>>
>> long[][] estimatedDistancesInBytes(ObjectField[] objectFields)
>>
>> this API returns a matrix of distances (let's say lower triangle 
>> without diagonal), a total of n*(n-1)/2 distances for n ObjectField 
>> instances. This requires quadratic space. For 1M objects, the API 
>> would return ~ 1/2 a trillion distances (4T bytes). This might be a 
>> problem.
>>
>> one could argue that if distances are a signed number (not absolute), 
>> so that distance(f1, f2) == -distance(f2, f1), the API could also be 
>> like:
>>
>> long[] estimatedDistancesInBytes(ObjectField[] objectFields)
>>
>> where the i-th element d[i] of the returned array would represent a 
>> distance(f[i], f[i+1]). To get a distance from f[x] to f[y] (y >= x), 
>> you would then just sum(d[i]; x <= i <= y).
>>
>> OTOH, an API like:
>>
>> long[] estimatedAddressesOf(ObjectField[] objectFields)
>>
>> would also work and would not require summing y-x+1 numbers to get a 
>> distance from f[x] to f[y] but just calculate one difference: a[y] - 
>> a[x]. If addresses are mangled (each is added the same random 
>> offset), the API doesn't expose any more of internals than 
>> alternative APIs that returns distances. I would also say that 
>> exposing distance(s) admits existence of points with location (i.e. 
>> addresses) so even philosophically you don't expose more internals.
>>
>>
>> Regards, Peter
>>
> Fourth variant could be an API like:
>
> long[] estimatedDistancesInBytes(ObjectField[] objectFields)
>
> where the i-th element d[i] of the returned array represents a 
> distance(f[0], f[i]).
>
> This is similar to addressessOf where the added offset to each 
> returned address is -addressOf(f[0]).
>
> Peter
>