RFR: 8276563: Undefined Behaviour in class Assembler

Jorn Vernee jvernee at openjdk.java.net
Sat Nov 6 16:19:37 UTC 2021


On Fri, 5 Nov 2021 17:20:14 GMT, Andrew Haley <aph at openjdk.org> wrote:

> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises.
> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example,
> 
> 
>   typedef RegisterImpl *Register;
>   const Register r10 = ((Register)10);
> 
> 
> Registers have accessors, e.g.:
> 
> ` int RegisterImpl::encoding() const { return (intptr_t)this; }`
> 
> This works by an accident of implementation: it is not legal C++.
> 
> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity)
> 
> 
>   extern RegisterImpl all_Registers[num_Registers];
>   int RegisterImpl::encoding() const   { return this - all_Registers; }
> 
> 
> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch.
> 
> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way:
> 
> `  int RegisterImpl::encoding() const   { return _encoding; }`
> 
> This would result in smaller code, but I suspect slower.
> 
> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly.

It's not clear to me why `Register` is implemented as a pointer in the first place, instead of a class with a single `int` or `intptr_t` field for the encoding. There is a comment about that in register.hpp, but it doesn't offer an explanation:

https://github.com/openjdk/jdk/blob/2653cfbf0f316183ea23dd234896b44f7dd6eae0/src/hotspot/share/asm/register.hpp#L37-L41

I wonder if changing the implementation to define `AbstractRegister` as:

    class AbstractRegister {
      int _value;
    protected:
      int value() { return _value; }
    };

have a class `Register` that extend from `AbstractRegister`, and remove all the `typedef RegisterImpl* Register` and similar, wouldn't be a viable solution as well? The implementations of `encoding()` could simply call `value()` after doing a validity  check (they already do in most cases).

To save us from having to change `->` to `.` everywhere, an overloaded `operator->` could be added to each implementation class that just returns `this` ([Godbolt](https://godbolt.org/z/Tajhc6s83)).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6280


More information about the hotspot-compiler-dev mailing list