[13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out

Calvin Cheung calvin.cheung at oracle.com
Wed Jul 17 15:37:22 UTC 2019


Dan, Ioi,

Thanks for taking another look.

I've pushed the changeset.

Calvin

On 7/17/19 6:41 AM, Daniel D. Daugherty wrote:
> On 7/16/19 11:09 PM, Calvin Cheung wrote:
>> Removing the fc.force(true) calls works and the test performs better 
>> on macosx. It takes about 20s for the test to finish without the call 
>> vs. up to 9 min. with the call.
>>
>> updated webrev:
>>
>>     http://cr.openjdk.java.net/~ccheung/8227646/webrev.01/
>
> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>     I'm assuming that the fc.force(true) calls were there out of an
>     abundance of caution/paranoia and that there is nothing in the
>     test that is relying on the data being properly flushed to disk.
>
>     Ioi states: "which I think is not necessary in our test"
>
>     and I don't see anything obvious in the test is relying on
>     complete data flushing.
>
> Thumbs up.
>
> Dan
>
>
>>
>> Ran tier1 and 2 tests successfully and repeated tier2 test 10 times 
>> on a Mac host from which the timeout was observed.
>>
>> thanks,
>>
>> Calvin
>>
>> On 7/16/19 9:09 AM, Calvin Cheung wrote:
>>> Hi Ioi,
>>>
>>> I found 2 fc.force(true) calls in the test. I've removed both and 
>>> testing it without increasing the timeout value.
>>>
>>> I test it by running the hotspot_tier2_runtime test group 10 times 
>>> on mac hosts. Each iteration takes about 30 min. Will let you know 
>>> about the results.
>>>
>>> thanks,
>>>
>>> Calvin
>>>
>>> On 7/16/19 8:25 AM, Ioi Lam wrote:
>>>> HI Calvin,
>>>>
>>>> Since the test is stuck at here at the timeout:
>>>>
>>>> at sun.nio.ch.FileDispatcherImpl.force0(java.base at 13-ea/Native Method)
>>>> at 
>>>> sun.nio.ch.FileDispatcherImpl.force(java.base at 13-ea/FileDispatcherImpl.java:82)
>>>> at 
>>>> sun.nio.ch.FileChannelImpl.force(java.base at 13-ea/FileChannelImpl.java:461)
>>>> at 
>>>> SharedArchiveConsistency.writeData(SharedArchiveConsistency.java:166)
>>>>
>>>> Maybe we should remove the calls to FileChannel.force()? According 
>>>> to the javadoc, this call is for "ensuring that critical 
>>>> information is not lost in the event of a system crash", which I 
>>>> think is not necessary in our test.
>>>>
>>>> src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c:
>>>>
>>>> JNIEXPORT jint JNICALL
>>>> Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this,
>>>>                                           jobject fdo, jboolean md)
>>>> {
>>>>     jint fd = fdval(env, fdo);
>>>>     int result = 0;
>>>>
>>>> #ifdef MACOSX
>>>>     result = fcntl(fd, F_FULLFSYNC);
>>>>     if (result == -1 && errno == ENOTSUP) {
>>>>         /* Try fsync() in case F_FULLSYUNC is not implemented on 
>>>> the file system. */
>>>>         result = fsync(fd);
>>>>     }
>>>>
>>>> https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html 
>>>>
>>>>
>>>>      F_FULLFSYNC        Does the same thing as fsync(2) then asks 
>>>> the drive to
>>>>                         flush all buffered data to the permanent 
>>>> storage
>>>>                         device (arg is ignored).  This is currently 
>>>> implemented
>>>>                         on HFS, MS-DOS (FAT), and Universal Disk 
>>>> Format
>>>>                         (UDF) file systems.  The operation may take 
>>>> quite a
>>>>                         while to complete.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>> On 7/16/19 7:59 AM, Calvin Cheung wrote:
>>>>> Dan,
>>>>>
>>>>> Thanks for your review!
>>>>>
>>>>> On 7/16/19 5:56 AM, Daniel D. Daugherty wrote:
>>>>>> On 7/16/19 12:31 AM, Calvin Cheung wrote:
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646
>>>>>>>
>>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/
>>>>>>
>>>>>> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>>>>>>     Does the test intentionally crash in one or more of the test 
>>>>>> cases?
>>>>>>     If not, then '-XX:-CreateCoredumpOnCrash' is not really needed.
>>>>>>     I don't think '-XX:-CreateCoredumpOnCrash' will prevent the 
>>>>>> timeout
>>>>>>     handling mechanism from trying to capture a core file in the 
>>>>>> case
>>>>>>     of a timeout.
>>>>> No, the test does not crash intentionally. Thanks for clarifying 
>>>>> the -XX:-CreateCoredumpOnCrash. I will revert the change.
>>>>>>
>>>>>>     The test currently timed out with a default total timeout 
>>>>>> value of
>>>>>>     480 seconds; that 480 comes from the default timeout value of 
>>>>>> 120
>>>>>>     seconds and the default timeout factor of 4 (480 == 120 * 4).
>>>>>>
>>>>>>     The 'timeout=1000' value will get you a total timeout value 
>>>>>> of 4000.
>>>>>>     I suspect that is not what you want.
>>>>>>
>>>>>>     If you specify 'timeout=240', you'll get a total timeout 
>>>>>> value of
>>>>>>     960 seconds (240 * 4).
>>>>>
>>>>> I've seen the total elapsed time for the test got very close to 
>>>>> 960s. So to be on the safe side, I would set the timeout=300 as 
>>>>> follows:
>>>>>
>>>>> diff --git 
>>>>> a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java 
>>>>> b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>>>>> --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>>>>> +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java
>>>>> @@ -35,7 +35,7 @@
>>>>>   * @build sun.hotspot.WhiteBox
>>>>>   * @compile test-classes/Hello.java
>>>>>   * @run driver ClassFileInstaller sun.hotspot.WhiteBox
>>>>> - * @run main/othervm -Xbootclasspath/a:. 
>>>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI 
>>>>> SharedArchiveConsistency
>>>>> + * @run main/othervm/timeout=300 -Xbootclasspath/a:. 
>>>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI 
>>>>> SharedArchiveConsistency
>>>>>   */
>>>>>  import jdk.test.lib.process.OutputAnalyzer;
>>>>>  import jdk.test.lib.Utils;
>>>>>
>>>>> I will do more testing with the above timeout before pushing the 
>>>>> change.
>>>>>
>>>>> Let me know if you'd like to see another webrev.
>>>>>
>>>>> thanks,
>>>>>
>>>>> Calvin
>>>>>
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Increase the timeout to 1000s and add the 
>>>>>>> -XX:-CreateCoredumpOnCrash option to disable coredump.
>>>>>>>
>>>>>>> Testing: on 2 macosx hosts on which the timeout was observed.
>>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> Calvin
>>>>>>>
>>>>>>
>>>>
>


More information about the hotspot-runtime-dev mailing list