RFR 8209138: Symbol constructor uses u1 as the element type of its name argument
David Holmes
david.holmes at oracle.com
Tue Oct 2 01:45:52 UTC 2018
Hi Harold,
Thanks for persevering with this!
On 2/10/2018 6:23 AM, Harold David Seigel wrote:
> Hi Coleen,
>
> Please review this updated webrev:
>
> http://cr.openjdk.java.net/~hseigel/bug_8209138.3/webrev/index.html
>
> It uses u1 for Symbol instead of char. I also added a "char char_at(int
> index)" function to class Symbol. It's for callers that want a char
> from a Symbol instead of a u1.
I was in two minds about this as we really are comparing the i'th byte
with a given char value. But I suppose if I read char_at(i) as
byte_as_char_at(i) then it's just a type-conversion operation. Though I
am curious what byte_at usages remain after this change?
src/hotspot/share/ci/ciSymbol.hpp
Can you humour me and please change this comment:
// Return the i-th char, where i < utf8_length
to:
// Return the i-th utf byte as a char, where i < utf8_length
---
src/hotspot/share/oops/symbol.hpp
char char_at(int index) const {
assert(index >=0 && index < length(), "symbol index overflow");
return base()[index];
}
I'm surprised this doesn't need a (char) cast to keep the compiler
happy. Some compilers may need it I think and it doesn't hurt to add it.
---
src/hotspot/share/runtime/vmStructs.cpp
There's a comment in:
./jdk.hotspot.agent/share/classes/sun/jvm/hotspot/types/Field.java
that needs updating:
FIXME: among other things, this interface is not sufficient to
describe fields which are themselves arrays (like Symbol's
jbyte _body[1]).
s/jbyte/u1/
Also:
./jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Symbol.java
accesses the array as jbytes - that may need adjusting.
----
Otherwise all seems fine.
Thanks,
David
-----
> Thanks! Harold
>
>
> On 9/27/2018 10:03 AM, coleen.phillimore at oracle.com wrote:
>>
>> Hi Harold, I think we've decided to make u1 the type we carry around
>> in Symbol instead of char, since that's more accurate to represent
>> utf8. Since the goal is to make the types consistent, can you try to
>> change it all to u1 instead. Keeping the name get_byte() and see how
>> bad the casting situation is?
>> thank you!
>> Coleen
>>
>>
>> On 9/27/18 9:17 AM, Harold David Seigel wrote:
>>> Hi David,
>>>
>>> Please review this updated webrev:
>>>
>>> http://cr.openjdk.java.net/~hseigel/bug_8209138.2/webrev/index.html
>>>
>>> It contains additional changes to use char for class Symbol.
>>>
>>> Hopefully, your concerns w.r.t. signed and unsigned types were
>>> resolved by our offline discussions.
>>>
>>> Thanks, Harold
>>>
>>>
>>> On 8/21/2018 2:24 AM, David Holmes wrote:
>>>> On 21/08/2018 3:19 PM, Kim Barrett wrote:
>>>>>> On Aug 20, 2018, at 8:53 PM, David Holmes
>>>>>> <david.holmes at oracle.com> wrote:
>>>>>>
>>>>>> Hi Harold,
>>>>>>
>>>>>> On 21/08/2018 12:43 AM, Harold David Seigel wrote:
>>>>>>> Hi,
>>>>>>> Please review this change for bug JDK-8209138. The fix changes
>>>>>>> class Symbol in symbol.hpp to use type char instead of types u1
>>>>>>> and jbyte and renames relevant functions by replacing 'byte' with
>>>>>>> 'char'. For example, 'Symbol::byte_at_put()' is now
>>>>>>> 'Symbol::char_at_put()'.
>>>>>>
>>>>>> It's not clear to me exactly what all these things should be. In a
>>>>>> lot of places we seem to be dealing with UTF8 character
>>>>>> representations, which to me should be u1 (afterall that is how
>>>>>> they are specified in the classfile format!) which is just an
>>>>>> unsigned "byte". But char is signed (in our build).
>>>>>
>>>>> Not sure what’s meant by “in our build” but char is unsigned when
>>>>> building HotSpot for aarch64 / arm. See, for example, JDK-8209586.
>>>>
>>>> Ah! We have always set -fsigned-char for the Oracle ARM and ARM64
>>>> builds (and previously PPC32), but now I see that, as you say,
>>>> Aarch64 does not set this (nor PPC64). Interesting and unfortunate.
>>>> I'm somewhat surprised that we haven't encountered more issues due
>>>> to this.
>>>>
>>>> In any case we are still converting from an unsigned type to a
>>>> signed type in many cases with this change.
>>>>
>>>> David
>>>> -----
>>>>
>>>>>
>>>>>> The current change just seems to push out the boundary where we
>>>>>> convert between things. For example ciSymbol now has a char
>>>>>> base()** but still has a byte_at() method - should that not know
>>>>>> be char_at() for consistency? Especially given byte_at() returns
>>>>>> get_symbol()->char-at()
>>>>>>
>>>>>> ** Granted the existing jbyte seems the wrong choice too.
>>>>>>
>>>>>> I think we have a lot of type confusion in our code, but it isn't
>>>>>> clear to me that this particular change is necessarily changing
>>>>>> things in the right direction.
>>>>>>
>>>>>> Cheers,
>>>>>> David
>>>>>>
>>>>>>
>>>>>>> Open Webrev:
>>>>>>> http://cr.openjdk.java.net/~hseigel/bug_8209138/webrev/index.html
>>>>>>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8209138
>>>>>>> The change was tested by running Mach5 tiers 1 and 2 tests and
>>>>>>> builds on Linux-x64, Windows, and Mac OS X, running tiers 3-5
>>>>>>> tests on Linux-x64, and by running JCK-11 API, Lang and VM tests
>>>>>>> on Linux-x64.
>>>>>>> Thanks, Harold
>>>>>
>>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list