RFR: 8243996: Remove hardcoded field offsets from HotSpot

Thu May 14 09:30:27 UTC 2020

Hi John,

Thank you for reviewing this change and sharing your thoughts.

On 2020-05-13 18:00, John Rose wrote:
> I like this cleanup!
>
> There are new tricky dependencies between the interpreter and other
> runtime bits, and class loading; they are adequately documented in new
> comments.
>
> A few comments of my own:
>
> The whitespace on the new line in vmSymbols.hpp needs to be adjusted to
> resemble whitespace on neighboring lines.  Hotspot is a jungle of old code,
> and our new code needs to blend into that jungle, not make spots of new
> color (unless of course there is a purpose to some new color).

Ah, missed that. Fixed. Emacs nearly dies every time I open that file; 
looks like I
was too eager to get out of that file.

> I don’t want to gripe needlessly about identifier choice, but… maybe I
> do in this case:  “interpreter_init_code” and “interpreter_init_code_stub”
> seem overly cryptic to me; they don’t help me remember what they do.
> I suggest that“interpreter_init_stubs” and “interpreter_init_code” or
> “interpreter_init_stub_code” and “interpreter_init_main_code” would
> be easier on maintainers.  But if it’s just me, leave it as is.  We coders eat
> cryptic identifiers for breakfast.

Ah yes. Coleen and I have already been through a few rounds of naming 
for this.
I started off with initialize1 and initialize2, similar to other 
initialization functions.
But Coleen accurately pointed out those names do not reflect what it 
does. So I
changed the name to initialize_code_stub and initialize_generate, but Coleen
accurately pointed out that initialize_generate is grammatically 
incorrect. So I changed
it to initialize_code to be grammatically correct yet still state what 
is being initialized.
But I agree with you that unfortunately the difference between 
initializing a code stub
and code is a bit unclear. I liked your first suggestion more than the 
second suggestion,
so let's take that for a spin and see if anyone else wants a different name.

> Can someone remind me (a) which platforms use the C++ interpreter,
> and (b) how well they are tested?

Unfortunately, I do not know.

> I spot-checked for queries to the hard-coded offsets, and found enough
> “guarantee(this_offset != 0…)” to make me feel secure.  If there’s a bug
> related to this change, it’s quite possibly a zero offset being used on some
> path in the JVM which is rarely executed.  (Such a bug could be introduced
> in the future, right?)  Usually we avoid such bugs by encapsulating variable
> state into C++ access functions which perform their own assertions, rather
> than requiring users of the variable to make their own checks.

Right.

> Did you look into making all those public variables private, and wrapping
> access functions around them?  I think that would make for a more complete
> cleanup.  It would touch more places in the interpreter, including platform
> code, which I think would be a good thing.  It would (a) allow review now
> of dependency checks in the interpreter, and (b) force downstream porters
> to similarly review their use of hard-coded offsets.

Indeed I noted that odd old colour, but was not sure I had a strong 
enough reason to change it to a
new colour, given the fan out. ;)

Among the public variables you suggest making private, It seems that 
only the referent is used
in platform specific code (interpreter), and is used in the same way on 
all platforms. And the whole point
of this exercise was to remove *all* dependencies from interpreter code 
to offset initialization order.
In other words, before the interpreter code had to explicitly know 
exactly which offsets it could vs could
not use, based on whether they were hardcoded or not. Now it can use all 
offsets as it pleases, by design.
Therefore I don't think we achieve neither (a) nor (b) by changing this 
colour to a new one.

Having said that, it certainly is nicer and good practice to make these 
variables private and use accessor
functions for them, for stylistic, consistency and encapsulation 
reasons. I also noticed that half of the
offset variables use a leading '_' and the others do not, making the 
code quite inconsistent. I could send
a new RFR trying to normalize the style of the various offsets in 
JavaClasses to use a leading '_', be private
and have an accessor function that asserts offset sanity. What do you think?

Not sure how I got here implementing support for compressed class 
pointers in the absence of compressed
oops. Oh dear...

New webrev so far:
http://cr.openjdk.java.net/~eosterlund/8243996/webrev.02/

Incremental:
http://cr.openjdk.java.net/~eosterlund/8243996/webrev.01..02/

Thanks,
/Erik

> — John
>
>> On May 13, 2020, at 8:26 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>>
>> Hi,
>>
>> Let's remove hardcoded field offsets. Main trick to get this done is to split interpreter initialization
>> into two parts: 1. reserve the interpreter memory, 2. generate the interpreter. This allows reserving
>> it before loading methods (which generates i2c adapters that assert things are "in the interpreter",
>> requiring knowledge about its address range), and then load the methods, and then generate the interpreter.
>> This moving of interpreter code generation until after methods have been loaded allows offsets to be
>> looked up, needed by code generation. That way, we no longer need to hardcode field offsets, which
>> makes it less annoying to change the field layout code in the future.
>>
>> Thanks to Coleen for pre-reviewing this.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8243996
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8243996/webrev.01/
>>
>> Thanks,
>> /Erik