RFR: 8276563: Undefined Behaviour in class Assembler [v3]

Thomas Stuefe stuefe at openjdk.java.net
Sat Nov 13 07:54:36 UTC 2021


On Thu, 11 Nov 2021 16:51:06 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises.
>> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example,
>> 
>> 
>>   typedef RegisterImpl *Register;
>>   const Register r10 = ((Register)10);
>> 
>> 
>> Registers have accessors, e.g.:
>> 
>> ` int RegisterImpl::encoding() const { return (intptr_t)this; }`
>> 
>> This works by an accident of implementation: it is not legal C++.
>> 
>> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity)
>> 
>> 
>>   extern RegisterImpl all_Registers[num_Registers];
>>   int RegisterImpl::encoding() const   { return this - all_Registers; }
>> 
>> 
>> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch.
>> 
>> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way:
>> 
>> `  int RegisterImpl::encoding() const   { return _encoding; }`
>> 
>> This would result in smaller code, but I suspect slower.
>> 
>> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly.
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Re-establish the FloatRegister::successor() hack.

Hi @theRealAph,

I had a look at the changes. Note that I don't know arm very well, but reading the code has been very interesting.

Cheers, Thomas

src/hotspot/cpu/aarch64/register_aarch64.cpp line 56:

> 54: 
> 55: const char* FloatRegisterImpl::name() const {
> 56:   static const char *const names[number_of_registers] = {

While reading this code I noticed that this method is sensitive to changes to is_valid, therefore care has to be taken when changing its semantics. Or, maybe just extend the array to number_of_declared_registers and add strings for ZR and SP, just to be safe.

Update: my comment is in the wrong place, I meant to comment RegisterImpl::name().

src/hotspot/cpu/aarch64/register_aarch64.hpp line 44:

> 42:   inline friend const Register as_Register(int encoding);
> 43: 
> 44: private:

nit: private not needed

src/hotspot/cpu/aarch64/register_aarch64.hpp line 63:

> 61:   // accessors
> 62:   int encoding() const             { assert(is_valid(), "invalid register"); return encoding_nocheck(); }
> 63:   bool is_valid() const            { return this >= first() && this < invalid(); }

is_valid() now returns true for ZR, SP. Was this intended? This affects other functions too, e.g. `RegisterImpl::name()`.

src/hotspot/cpu/aarch64/register_aarch64.hpp line 64:

> 62:   int encoding() const             { assert(is_valid(), "invalid register"); return encoding_nocheck(); }
> 63:   bool is_valid() const            { return this >= first() && this < invalid(); }
> 64:   bool has_byte_register() const   { return is_valid(); }

Same here, semantics changed to include 32 and 33

src/hotspot/cpu/aarch64/register_aarch64.hpp line 255:

> 253:   enum {
> 254:     number_of_registers = 16,
> 255:     number_of_declared_registers = 16,

Is there a semantic difference between number of registers and declared registers for float and P registers? Otherwise, having two constants for the same thing makes the code less clear and maybe more error prone.

src/hotspot/share/asm/register.hpp line 62:

> 60: #else // USE_POINTERS_TO_REGISTER_IMPL_ARRAY
> 61: 
> 62: #define REGISTER_IMPL_DECLARATION(type, impl_type)                      \

nit: align backslashes?

src/hotspot/share/asm/register.hpp line 64:

> 62: #define REGISTER_IMPL_DECLARATION(type, impl_type)                      \
> 63: inline const type as_ ## type(int encoding) {                           \
> 64:   assert(encoding <= impl_type::number_of_declared_registers, "invalid register"); \

assert for >= -1 too?

src/hotspot/share/asm/register.hpp line 71:

> 69: 
> 70: #define REGISTER_IMPL_DEFINITION(type, impl_type)                    \
> 71: impl_type all_ ## type ## s[impl_type::number_of_declared_registers];

Would this not need attribute visibility too?

src/hotspot/share/asm/register.hpp line 87:

> 85: // OS-specific way.
> 86: #ifdef __GNUC__
> 87: #define INTERNAL_VISIBILITY  __attribute__ ((visibility ("internal")))

I try to understand this, is this to allow other object files to see these symbols while preventing them from being exported from the libjvm?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6280


More information about the hotspot-compiler-dev mailing list