[13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out

Ioi Lam ioi.lam at oracle.com
Wed Jul 17 03:26:54 UTC 2019


Looks good. Thanks!

- Ioi

On 7/16/19 8:09 PM, Calvin Cheung wrote:
> Removing the fc.force(true) calls works and the test performs better 
> on macosx. It takes about 20s for the test to finish without the call 
> vs. up to 9 min. with the call.
>
> updated webrev:
>
>     http://cr.openjdk.java.net/~ccheung/8227646/webrev.01/
>
> Ran tier1 and 2 tests successfully and repeated tier2 test 10 times on 
> a Mac host from which the timeout was observed.
>
> thanks,
>
> Calvin
>
> On 7/16/19 9:09 AM, Calvin Cheung wrote:
>> Hi Ioi,
>>
>> I found 2 fc.force(true) calls in the test. I've removed both and 
>> testing it without increasing the timeout value.
>>
>> I test it by running the hotspot_tier2_runtime test group 10 times on 
>> mac hosts. Each iteration takes about 30 min. Will let you know about 
>> the results.
>>
>> thanks,
>>
>> Calvin
>>
>> On 7/16/19 8:25 AM, Ioi Lam wrote:
>>> HI Calvin,
>>>
>>> Since the test is stuck at here at the timeout:
>>>
>>> at sun.nio.ch.FileDispatcherImpl.force0(java.base at 13-ea/Native Method)
>>> at 
>>> sun.nio.ch.FileDispatcherImpl.force(java.base at 13-ea/FileDispatcherImpl.java:82)
>>> at 
>>> sun.nio.ch.FileChannelImpl.force(java.base at 13-ea/FileChannelImpl.java:461)
>>> at 
>>> SharedArchiveConsistency.writeData(SharedArchiveConsistency.java:166)
>>>
>>> Maybe we should remove the calls to FileChannel.force()? According 
>>> to the javadoc, this call is for "ensuring that critical information 
>>> is not lost in the event of a system crash", which I think is not 
>>> necessary in our test.
>>>
>>> src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c:
>>>
>>> JNIEXPORT jint JNICALL
>>> Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this,
>>>                                           jobject fdo, jboolean md)
>>> {
>>>     jint fd = fdval(env, fdo);
>>>     int result = 0;
>>>
>>> #ifdef MACOSX
>>>     result = fcntl(fd, F_FULLFSYNC);
>>>     if (result == -1 && errno == ENOTSUP) {
>>>         /* Try fsync() in case F_FULLSYUNC is not implemented on the 
>>> file system. */
>>>         result = fsync(fd);
>>>     }
>>>
>>> https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html 
>>>
>>>
>>>      F_FULLFSYNC        Does the same thing as fsync(2) then asks 
>>> the drive to
>>>                         flush all buffered data to the permanent 
>>> storage
>>>                         device (arg is ignored).  This is currently 
>>> implemented
>>>                         on HFS, MS-DOS (FAT), and Universal Disk Format
>>>                         (UDF) file systems.  The operation may take 
>>> quite a
>>>                         while to complete.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>> On 7/16/19 7:59 AM, Calvin Cheung wrote:
>>>> Dan,
>>>>
>>>> Thanks for your review!
>>>>
>>>> On 7/16/19 5:56 AM, Daniel D. Daugherty wrote:
>>>>> On 7/16/19 12:31 AM, Calvin Cheung wrote:
>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646
>>>>>>
>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/
>>>>>
>>>>> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>>>>>     Does the test intentionally crash in one or more of the test 
>>>>> cases?
>>>>>     If not, then '-XX:-CreateCoredumpOnCrash' is not really needed.
>>>>>     I don't think '-XX:-CreateCoredumpOnCrash' will prevent the 
>>>>> timeout
>>>>>     handling mechanism from trying to capture a core file in the case
>>>>>     of a timeout.
>>>> No, the test does not crash intentionally. Thanks for clarifying 
>>>> the -XX:-CreateCoredumpOnCrash. I will revert the change.
>>>>>
>>>>>     The test currently timed out with a default total timeout 
>>>>> value of
>>>>>     480 seconds; that 480 comes from the default timeout value of 120
>>>>>     seconds and the default timeout factor of 4 (480 == 120 * 4).
>>>>>
>>>>>     The 'timeout=1000' value will get you a total timeout value of 
>>>>> 4000.
>>>>>     I suspect that is not what you want.
>>>>>
>>>>>     If you specify 'timeout=240', you'll get a total timeout value of
>>>>>     960 seconds (240 * 4).
>>>>
>>>> I've seen the total elapsed time for the test got very close to 
>>>> 960s. So to be on the safe side, I would set the timeout=300 as 
>>>> follows:
>>>>
>>>> diff --git 
>>>> a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java 
>>>> b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>>>> --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>>>> +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>>>> @@ -35,7 +35,7 @@
>>>>   * @build sun.hotspot.WhiteBox
>>>>   * @compile test-classes/Hello.java
>>>>   * @run driver ClassFileInstaller sun.hotspot.WhiteBox
>>>> - * @run main/othervm -Xbootclasspath/a:. 
>>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI 
>>>> SharedArchiveConsistency
>>>> + * @run main/othervm/timeout=300 -Xbootclasspath/a:. 
>>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI 
>>>> SharedArchiveConsistency
>>>>   */
>>>>  import jdk.test.lib.process.OutputAnalyzer;
>>>>  import jdk.test.lib.Utils;
>>>>
>>>> I will do more testing with the above timeout before pushing the 
>>>> change.
>>>>
>>>> Let me know if you'd like to see another webrev.
>>>>
>>>> thanks,
>>>>
>>>> Calvin
>>>>
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Increase the timeout to 1000s and add the 
>>>>>> -XX:-CreateCoredumpOnCrash option to disable coredump.
>>>>>>
>>>>>> Testing: on 2 macosx hosts on which the timeout was observed.
>>>>>>
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Calvin
>>>>>>
>>>>>
>>>



More information about the hotspot-runtime-dev mailing list