RFR: 8276563: Undefined Behaviour in class Assembler

Jorn Vernee jvernee at openjdk.java.net
Tue Nov 9 13:34:39 UTC 2021


On Fri, 5 Nov 2021 17:20:14 GMT, Andrew Haley <aph at openjdk.org> wrote:

> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises.
> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example,
> 
> 
>   typedef RegisterImpl *Register;
>   const Register r10 = ((Register)10);
> 
> 
> Registers have accessors, e.g.:
> 
> ` int RegisterImpl::encoding() const { return (intptr_t)this; }`
> 
> This works by an accident of implementation: it is not legal C++.
> 
> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity)
> 
> 
>   extern RegisterImpl all_Registers[num_Registers];
>   int RegisterImpl::encoding() const   { return this - all_Registers; }
> 
> 
> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch.
> 
> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way:
> 
> `  int RegisterImpl::encoding() const   { return _encoding; }`
> 
> This would result in smaller code, but I suspect slower.
> 
> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly.

LGTM.

I noticed this is only fixed on AArch64, but AFAICS we could have the same problem on x86.

src/hotspot/cpu/aarch64/register_aarch64.hpp line 44:

> 42: extern name all_ ## type ## s[name::number_of_declared_registers] INTERNAL_VISIBILITY; \
> 43: constexpr type first_ ## type = all_ ## type ## s;                      \
> 44: inline constexpr type name::first() { return all_ ## type ## s; }

Same here:

Suggestion:

// Macros to help define all kinds of registers

#define REGISTER_IMPL_DECLARATION(type, impl_type)                           \
inline const type as_ ## type(int encoding) {                           \
  assert(encoding <= impl_type::number_of_declared_registers, "invalid register"); \
  return encoding == -1 ? impl_type::invalid() : impl_type::first() + encoding;   \
}                                                                       \
extern impl_type all_ ## type ## s[impl_type::number_of_declared_registers] INTERNAL_VISIBILITY; \
constexpr type first_ ## type = all_ ## type ## s;                      \
inline constexpr type impl_type::first() { return all_ ## type ## s; }

src/hotspot/cpu/aarch64/register_aarch64.hpp line 47:

> 45: 
> 46: #define REGISTER_IMPL_DEFINITION(type, name)                    \
> 47: name all_ ## type ## s[name::number_of_declared_registers];

The use of the macro parameters `type` and `name` here, is a bit confusing since they mean something else in the `CONSTANT_REGISTER_DECLARATION` macro below.

I'd suggest changing the parameter names to `type` and `impl_type` instead, to reflect that they are `<regtype>` and `<regtype>Impl`

Suggestion:

#define REGISTER_IMPL_DEFINITION(type, impl_type)                    \
impl_type all_ ## type ## s[impl_type::number_of_declared_registers];

src/hotspot/cpu/aarch64/register_aarch64.hpp line 179:

> 177:   // accessors
> 178:   bool is_valid() const           { return this < invalid(); }
> 179:   bool has_byte_register() const  { return is_valid(); }

Is `has_byte_register` needed here? I see it in the previous code for `Register` but not for `FloatRegister`.
Suggestion:

src/hotspot/cpu/aarch64/register_aarch64.hpp line 286:

> 284:   // accessors
> 285:   bool is_valid() const          { return this < invalid(); }
> 286:   bool has_byte_register() const { return is_valid(); }

Same here, seems to be added but not used.
Suggestion:

-------------

Marked as reviewed by jvernee (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6280


More information about the hotspot-compiler-dev mailing list