RFR: 8276563: Undefined Behaviour in class Assembler [v9]

Thomas Stuefe stuefe at openjdk.java.net
Thu Nov 18 06:46:43 UTC 2021


On Wed, 17 Nov 2021 14:35:11 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises.
>> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example,
>> 
>> 
>>   typedef RegisterImpl *Register;
>>   const Register r10 = ((Register)10);
>> 
>> 
>> Registers have accessors, e.g.:
>> 
>> ` int RegisterImpl::encoding() const { return (intptr_t)this; }`
>> 
>> This works by an accident of implementation: it is not legal C++.
>> 
>> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity)
>> 
>> 
>>   extern RegisterImpl all_Registers[num_Registers];
>>   int RegisterImpl::encoding() const   { return this - all_Registers; }
>> 
>> 
>> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch.
>> 
>> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way:
>> 
>> `  int RegisterImpl::encoding() const   { return _encoding; }`
>> 
>> This would result in smaller code, but I suspect slower.
>> 
>> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly.
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Tweako stuff.

I was confused that there were no x86 changes, or is that part of a future RFE?

Mostly nits and questions remain.

Cheers, Thomas

src/hotspot/cpu/aarch64/register_aarch64.hpp line 56:

> 54: 
> 55:   // construction
> 56:   inline friend constexpr Register as_Register(int encoding);

This is getting bikesheddy so feel free to ignore:

Instead if using friend I would probably either just make `first()` public. 
Or, make as_xxx a static class method and wrap it with a global scope wrapper like this:


--- a/src/hotspot/cpu/aarch64/register_aarch64.hpp
+++ b/src/hotspot/cpu/aarch64/register_aarch64.hpp
@@ -53,7 +53,8 @@ public:
   const Register successor() const { return this + 1; }

   // construction
-  inline friend constexpr Register as_Register(int encoding);
+  static constexpr Register as(int encoding);
+


diff --git a/src/hotspot/share/asm/register.hpp b/src/hotspot/share/asm/register.hpp
index 06a8735f520..7d9036b0ff6 100644
--- a/src/hotspot/share/asm/register.hpp
+++ b/src/hotspot/share/asm/register.hpp
@@ -59,9 +59,12 @@ enum { name##_##type##EnumValue = (value) }
 #else // USE_POINTERS_TO_REGISTER_IMPL_ARRAY
 
 #define REGISTER_IMPL_DECLARATION(type, impl_type, reg_count)           \
-inline constexpr type as_ ## type(int encoding) {                       \
+inline constexpr type impl_type::as(int encoding) {                     \
   return impl_type::first() + encoding;                                 \
 }                                                                       \
+inline constexpr type as_ ## type(int encoding) {                       \
+  return impl_type::as(encoding);                                       \
+}                                                                       \
 extern impl_type all_ ## type ## s[reg_count + 1] INTERNAL_VISIBILITY;  \
 inline constexpr type impl_type::first() { return all_ ## type ## s + 1; }

src/hotspot/cpu/aarch64/register_aarch64.hpp line 63:

> 61:   int encoding() const             { assert(is_valid(), "invalid register"); return encoding_nocheck(); }
> 62:   bool is_valid() const            { return (unsigned)encoding_nocheck() < number_of_registers; }
> 63:   bool has_byte_register() const   { return this >= first() && this - first() < number_of_byte_registers; }

Why not relegate to encoding_nocheck() too: `return encoding_nocheck() >= 0 && encoding_nocheck() < num_byte_regs` ?

src/hotspot/cpu/aarch64/register_aarch64.hpp line 156:

> 154:   FloatRegister successor() const {
> 155:     return as_FloatRegister((encoding() + 1) % (unsigned)number_of_registers);
> 156:   }

Different from the other two, why? If we need validity checks here, should we not do them with the other types too?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6280


More information about the hotspot-compiler-dev mailing list