Request for review 6924920: Class Data Sharing limit on the java version string can create failures: JVM Ident field too long

Mon Nov 26 05:32:25 PST 2012

Hi David,

Thanks again for the review.  I used a template because I couldn't find 
a better way to check the size of the header_version buffer.

Also, I'll look into changing the size_t to int.

Thanks, Harold

On 11/22/2012 10:03 PM, David Holmes wrote:
> On 22/11/2012 10:49 PM, David Holmes wrote:
>> On 22/11/2012 4:50 AM, harold seigel wrote:
>>> Hi,
>>>
>>> Please review this new version of the fix for bug 6924920:
>>>
>>> http://cr.openjdk.java.net/~hseigel/bug_6924920_3/
>>> <http://cr.openjdk.java.net/%7Ehseigel/bug_6924920_3/>
>>>
>>> This fix contains the following changes:
>>>
>>> 1. Uses of a template to verify that the incoming buffer to function
>>> get_header_version() is the correct size.
>>
>> Certainly more complex than I had expected - not that familiar with
>> template functions. But curiously for the second time today I see a
>> size_t variable being used in an array index expression:
>>
>> template <size_t N> static void get_header_version(char
>> (&header_version) [N]) {
>>
>> where it should be an int (according to ISO C standard).
>
> I humbly withdraw that comment. It is required to be an "integer type" 
> which includes both signed and unsigned and various sizes.
>
> I certainly prefer to use only int values. YMMV.
>
> David
> ------
>
>>> 2. Calls a better hash function.
>>>
>>> 3. Corrects a null termination problem.
>>
>> Seems okay.
>>
>> David
>> -----
>>
>>> Thank you,
>>> Harold
>>>
>>> On 11/16/2012 9:01 AM, harold seigel wrote:
>>>> Hi David,
>>>>
>>>> That's a good point. I'll add a check that the incoming argument is
>>>> the expected length.
>>>>
>>>> Thanks, Harold
>>>>
>>>> On 11/15/2012 10:46 PM, David Holmes wrote:
>>>>> Hi Harold,
>>>>>
>>>>> The only nit I have with factoring the logic into a separate function
>>>>> is that the function "knows" that only _jvm_ident is going to be
>>>>> passed, as it assumes what the size is.
>>>>>
>>>>> Otherwise it seems functionally correct.
>>>>>
>>>>> David
>>>>>
>>>>> On 16/11/2012 12:20 AM, harold seigel wrote:
>>>>>> Please review these updated changes to fix bug 6924920. The proposed
>>>>>> changes implement what was discussed in my 10/11/2012 reply to 
>>>>>> Peter.
>>>>>>
>>>>>> The updated changes are available at
>>>>>> http://cr.openjdk.java.net/~hseigel/bug_6924920_2/
>>>>>> <http://cr.openjdk.java.net/%7Ehseigel/bug_6924920_2/>
>>>>>>
>>>>>> This implementation was straightforward because it used an 
>>>>>> existing vm
>>>>>> hash function.
>>>>>>
>>>>>> Thanks, Harold
>>>>>>
>>>>>> On 11/13/2012 5:05 PM, harold seigel wrote:
>>>>>>> HI David,
>>>>>>>
>>>>>>> Thanks for your comments. However, we still think that appending a
>>>>>>> hash code is worth doing. As you point out, this is not a fool 
>>>>>>> proof
>>>>>>> scheme because of potential hash collisions, but it will provide a
>>>>>>> stronger sanity check than just comparing truncated strings.
>>>>>>>
>>>>>>> Also, since this hash will only occur when writing to or opening 
>>>>>>> the
>>>>>>> shared area, the cost of doing the hash is not significant.
>>>>>>>
>>>>>>> I plan to send out new code to review this week.
>>>>>>>
>>>>>>> Thanks, Harold
>>>>>>>
>>>>>>> On 11/9/2012 6:12 PM, David Holmes wrote:
>>>>>>>> On 10/11/2012 6:08 AM, harold seigel wrote:
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> We plan to use something similar to what you suggested. If the
>>>>>>>>> version
>>>>>>>>> string is < JVM_IDENT_MAX then it will not get truncated. Any
>>>>>>>>> version
>>>>>>>>> string >= JVM_IDENT_MAX will get truncated and the last N bytes
>>>>>>>>> of the
>>>>>>>>> string will be filled in with hex digits from a hash. Code that
>>>>>>>>> then
>>>>>>>>> reads this version string can tell whether or not the string has
>>>>>>>>> been
>>>>>>>>> hashed based on its length.
>>>>>>>>
>>>>>>>> This seems excessive. You could still get a hash collision and the
>>>>>>>> likelihood of that depends on the hashing algorithm and the likely
>>>>>>>> differences in such long version strings (which you don't know).
>>>>>>>> Plus
>>>>>>>> you can always build two quite different VMs that have the same
>>>>>>>> version string!
>>>>>>>>
>>>>>>>> As I understand it this is meant to be a basic sanity check not a
>>>>>>>> foolproof identification scheme - if you want the latter then
>>>>>>>> generate a GUID on every build and use that.
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Thanks for the suggestion!
>>>>>>>>> Harold
>>>>>>>>>
>>>>>>>>> On 11/9/2012 11:46 AM, Peter Levart wrote:
>>>>>>>>>> On 11/09/2012 05:40 AM, David Holmes wrote:
>>>>>>>>>>> On 9/11/2012 5:48 AM, Coleen Phillimore wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Harold,
>>>>>>>>>>>> I looked at this once and thought it was okay, but now I don't
>>>>>>>>>>>> think it
>>>>>>>>>>>> is. I think you want fail_stop() but just not to print the
>>>>>>>>>>>> string
>>>>>>>>>>>> that
>>>>>>>>>>>> isn't initialized.
>>>>>>>>>>>> fail_continue() is used when reading the archive. It allows 
>>>>>>>>>>>> the
>>>>>>>>>>>> vm to
>>>>>>>>>>>> continue without using the shared archive. But when you're
>>>>>>>>>>>> dumping it,
>>>>>>>>>>>> if the string is truncated, it won't create a usable shared
>>>>>>>>>>>> archive
>>>>>>>>>>>> anyway. The fail_continue() closes the CDS archive file.
>>>>>>>>>>>>
>>>>>>>>>>>> If you want to have a truncated string work, you have issue 
>>>>>>>>>>>> the
>>>>>>>>>>>> fail_continue() message as a warning only. Then change the
>>>>>>>>>>>> code for
>>>>>>>>>>>> Xshare:on that validates that the versions match to 
>>>>>>>>>>>> truncate the
>>>>>>>>>>>> current_version() too.
>>>>>>>>>>>>
>>>>>>>>>>>> I suggest leaving it as fail_stop() but fixing the message to
>>>>>>>>>>>> not
>>>>>>>>>>>> print
>>>>>>>>>>>> uninitialized data.
>>>>>>>>>>>
>>>>>>>>>>> I agree that fail_continue is the wrong choice. But it seems
>>>>>>>>>>> to me
>>>>>>>>>>> that when the ident is truncated all that is needed is a 
>>>>>>>>>>> warning
>>>>>>>>>>> printed. AFAICS the archive will still be usable with a 
>>>>>>>>>>> truncated
>>>>>>>>>>> ident and it will match when read back in during validate().
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> But wouldn't also match when read back with a version string 
>>>>>>>>>> that
>>>>>>>>>> differs from the writer's version string only in chars past 255?
>>>>>>>>>> Are
>>>>>>>>>> those chars significant?
>>>>>>>>>>
>>>>>>>>>> What about truncating the version string to 255-32 = 223 
>>>>>>>>>> bytes and
>>>>>>>>>> appending (only if truncated) a 32 hex digits of a MD5 of the
>>>>>>>>>> whole
>>>>>>>>>> version string? No warnings.
>>>>>>>>>>
>>>>>>>>>> Regards, Peter
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> ------
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Coleen
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/8/2012 1:23 PM, harold seigel wrote:
>>>>>>>>>>>>> Please review the following change.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Summary: The problem was fixed by truncating the JVM ident to
>>>>>>>>>>>>> JVM_IDENT_MAX and calling method fail_continue() instead of
>>>>>>>>>>>>> fail_stop().
>>>>>>>>>>>>>
>>>>>>>>>>>>> Open webrev at 
>>>>>>>>>>>>> http://cr.openjdk.java.net/~hseigel/bug_6924920/
>>>>>>>>>>>>> <http://cr.openjdk.java.net/%7Ehseigel/bug_6924920/>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bug link at http://bugs.sun.com/view_bug.do?bug_id=6924920
>>>>>>>>>>>>>
>>>>>>>>>>>>> The changes were tested with JPRT, JCK, and by using a 
>>>>>>>>>>>>> version
>>>>>>>>>>>>> string
>>>>>>>>>>>>> that exceeded 256 characters.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, Harold
>>>>>>>>>>