RFR 8071627: Code refactoring to override == operator of Symbol*

Ioi Lam ioi.lam at oracle.com
Wed Apr 8 23:49:59 UTC 2015


CC-int hotspot-dev to include the wider audience.

On 4/8/15 3:24 PM, John Rose wrote:
> On Apr 2, 2015, at 5:48 PM, Calvin Cheung <calvin.cheung at oracle.com> wrote:
>> Please review this enhancement for Symbol comparison.
>>
>> This should allow future enhancement such as multiple SymbolTable and symbols with the same utf8 strings in different tables should be considered "equivalent".
>>
>> Although this changeset touches many files, the main change is in src/share/vm/oops/symbol.hpp with a new SymbolRef class. The rest of the change is mostly replacing Symbol* with SymbolRef.
>>
>> Since currently there's only one single SymbolTable, it isn't feasible to write a specific testcase for this enhancement. We will provide a testcase when this enhancement is used.
>>
>> Note also that the copyright header will be fixed before this changeset is committed.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8071627
>>
>> webrev: http://cr.openjdk.java.net/~ccheung/8071627/webrev.00/
>>
>> Tests:
>>      JPRT
>>      nsk.jvmti on linux_x64
>>
>>      the following performance benmarks on linux_x64, solaris_x64, solaris_sparc, mac, windows_x64 with no significant performance degradation:
>>          jetstream, scimark, specjbb2000, specjbb2005, specjvm98, volano25
>>
>>      some internal class loading performance tests
>>
>> thanks,
>> Calvin
> 8600 lines of patch in 143 files is a disruptive change of the first order.  All by itself it will add cost to most backports, including CPUs.
>
> I would like to understand the need for such a change before it goes into the repo.
>
> In particular, exactly how many uses of SymbolRef::operator== are there in the new code?  What would be the complexity of changing those occurrences only?  You can easily find the answer to this question by renaming SymbolRef::operator== to SymbolRef::identity_equals, debugging, and then counting the occurrences of the new name.
>
> (I know there's a forward-moving cost to leaving C++ pointer-equality as a possible source of bugs.  But we can deal with this in other ways than removing a C++ pointer type from our source base.)
>
> — John
John,

We need to do this because we are trying to support the loading of 
multiple CDS archives into the same application process. These CDS 
archives may be created independently, and concurrently. It will be a 
big management headache to require that "all references to the same UTF8 
name must be represented by a unique Symbol* pointer". That's why we 
want to relax the requirement, such that the same UTF8 string can be 
allocated multiple times, one in each CDS archive.

These UTF8 strings are used to reference things across CDS archives. For 
example, archive A may invoke a class "X" in archive B.

Both archives A and B are memory mapped images of HotSpot metadata 
objects (InstanceKlass, Method, ConstantPool, etc). For example, A may 
be mapped to the address range 1000 ~ 2000, and B mapped to 5000 ~ 6000.

In archive A, the class name "X" would be represented by a Symbol whose 
address is 1234;
In archive B, the class name "X" would be represented by a Symbol whose 
address is 5678;

We need to test that these two Symbols refer to the same name "X".

There are many places where (Symbol* == Symbol*) are tested inside the 
current hotspot code. Unfortunately, if we allow multiple Symbols to 
refer to the same name, all of such code need to be changed. We cannot 
just selectively change some of them to Symbol::equals().

Also, even if we change all (Symbol* == Symbol*)  to 
(symbolA->equals(symbolB)), someone may by mistake introduce a new line 
of (Symbol* == Symbol*), and it will be very hard to detect such changes 
automatically and reliably.

Yes, I understand that the change is big, but it looks more scary than 
it actually is. Most of the changes in the webrev are mechanical 
changes, such as replacing "Symbol*" with "SymbolRef". It would cause 
some pain in back ports, but should be manageable.

Also, after this change, most code will NOT use Symbol anymore, and it 
will be very hard for you to try to get to a Symbol* pointer. So the 
chance of someone polluting the code again with a (Symbol* == Symbol*) 
line is very low.

- Ioi




More information about the hotspot-runtime-dev mailing list