RFR (S) JDK-8009575 - Reduce Symbol::_refcount from 4 bytes to 2 bytes
Ioi Lam
ioi.lam at oracle.com
Wed Mar 27 11:16:29 PDT 2013
On 03/27/2013 02:30 AM, David Holmes wrote:
>>>
>> The old code was
>>
>> class Symbol : public MetaspaceObj {
>> int _refcount;
>> int _identity_hash;
>> unsigned short _length;
>> jbyte _body[1];
>>
>> static int size(int length) {
>> size_t sz = heap_word_size(sizeof(Symbol) + (length > 0 ? length -
>> 1 : 0));
>> return align_object_size(sz);
>> }
>>
>> It assumes that sizeof(Symbol) is 11 bytes (2 ints + 1 short + 1 byte),
>> hence the "length - 1". I.e., if length == 2, size() => 11 + 2 - 1 = 12.
>
> That is not the way I read it. I assumed the _body[] was one of those
> variables we would manually extend. If the length argument to size was
> 2 then it says "I want space for two bytes" - but we already allocated
> one of them for _body, hence we use length-1 added to sizeof(Symbol).
>
> Now as you point out we may over allocate because of alignment. BUt I
> didn't see that directly connected to the use of "length-1".
I think we are reading it the same way -- "Symbol has already allocated
1 byte (or more) for the string, so I need to ask for length-1 more
bytes". The problem the "or more" is always true and always 1 extra byte
has been allocated by the compiler.
So after I reduce the size of refcount by 2, the compiler actually
allocates 3 extra bytes after body[0] -- to keep size(Symbol) aligned by
4. Without fixing the size() function, I would have seen no gain at all.
After fixing Symbol::size(), I found a bug in UTF8::as_quoted_ascii()
that would cause random crashes on Win32 (see "HERE" below in the code):
void Symbol::print_symbol_on(outputStream* st) const {
st = st ? st : tty;
st->print("%s", as_quoted_ascii());
}
char* Symbol::as_quoted_ascii() const {
const char *ptr = (const char *)&_body[0];
int quoted_length = UTF8::quoted_ascii_length(ptr, utf8_length());
char* result = NEW_RESOURCE_ARRAY(char, quoted_length + 1);
UTF8::as_quoted_ascii(ptr, result, quoted_length + 1);
return result;
}
// converts a utf8 string to quoted ascii
void UTF8::as_quoted_ascii(const char* utf8_str, char* buf, int buflen) {
const char *ptr = utf8_str;
char* p = buf;
char* end = buf + buflen;
while (*ptr != '\0') { <<<<<<<<<<<<<<<<<<<<<<<<HERE
jchar c;
ptr = UTF8::next(ptr, &c);
if (c >= 32 && c < 127) {
if (p + 1 >= end) break; // string is truncated
*p++ = (char)c;
} else {
if (p + 6 >= end) break; // string is truncated
sprintf(p, "\\u%04x", c);
p += 6;
}
}
*p = '\0';
}
The (*ptr != '\0') check in UTF8::as_quoted_ascii assumes that it's OK
to read one byte past the end of the utf8_str. This byte may not be
zero, but the following (p+1 >= end) check would ensure the loop terminates.
However, after I fixed the Symbol::size() function, I have a case where
Symbol x = 0x00003ff0{
_length = 8;
_body[] = "activate" // &_body[0] = 0x00003ff8
};
x.size() == 16 bytes
The first byte after the end of "activate" is at 0x00004000, i.e., at a
different page. The symbol is allocated by malloc. On Windows, the page
immediately following the newly allocated space could be unmapped. So
the (*ptr != '\0') line would crash with a faulting address of
0x00004000. I saw this in a JPRT crash dump.
The fix would be to add a utf8_length parameter to UTF8::as_quoted_ascii
-- similar to the existing function UNICODE::as_quoted_ascii(const
jchar* base, int length, char* buf, int buflen).
This bug could also crash the existing code if the first byte past the
end of the string is a UTF8 lead byte, which would cause extra reading
in UTF8::next(). We probably have never seen the crash because (I am
guessing)
(1) only Windows would have unmapped space immediately following a
malloc'ed block.
(2) windows fills the malloc'ed block with zeros, so UTF8::next() reads
only 1 byte.
I think I will file a separate bug for the UTF8::as_quoted_ascii, and
fix it first before committing the _refcount fix.
- Ioi
More information about the hotspot-runtime-dev
mailing list