RFR(S): 8240295: hs_err elapsed time in seconds is not accurate enough

serguei.spitsyn at oracle.com serguei.spitsyn at oracle.com
Sat Mar 7 07:53:21 UTC 2020


Hi Kevin,

This looks okay to me as well.

Thanks,
Serguei


On 3/5/20 02:38, David Holmes wrote:
> Thanks Kevin. I think this is the less risky change and achieves the 
> goal.
>
> David
>
> On 5/03/2020 8:00 pm, Kevin Walls wrote:
>> Thanks -
>>
>> I had tried some ideas in the simple fashion, and we can use %06d 
>> formatting.... OK maybe such formatting is not as "bad" as %f...
>>
>> (glibc parses the int width specified without allocation.  We provide 
>> the output buffer, I don't think we will cause  vfprintf code to 
>> alloca or malloc.)
>>
>> I can offer a second version below that uses %d only.  Testing 
>> alongside %f in the same line, it retains the same value and 
>> position, e.g.
>>
>> Time: Thu Mar  5 08:57:50 2020 UTC elapsed time: f: 2.001065 int: 
>> 2.001065 (raw int: 1065) seconds (0d 0h 0m 2s)
>>
>> Output example from the hg diff below (not from the same run):
>>
>> Time: Thu Mar  5 09:28:01 2020 UTC elapsed time: 2.000611 seconds (0d 
>> 0h 0m 2s)
>>
>>
>> Thanks!
>> Kevin
>>
>>
>> $ hg diff
>> diff --git a/src/hotspot/share/runtime/os.cpp 
>> b/src/hotspot/share/runtime/os.cpp
>> --- a/src/hotspot/share/runtime/os.cpp
>> +++ b/src/hotspot/share/runtime/os.cpp
>> @@ -1016,10 +1016,9 @@
>>     }
>>
>>     double t = os::elapsedTime();
>> -  // NOTE: It tends to crash after a SEGV if we want to 
>> printf("%f",...) in
>> -  //       Linux. Must be a bug in glibc ? Workaround is to round 
>> "t" to int
>> -  //       before printf. We lost some precision, but who cares?
>> +  // NOTE: a crash using printf("%f",...) on Linux was historically 
>> noted here.
>>     int eltime = (int)t;  // elapsed time in seconds
>> +  int eltimeFraction = (int) ((t - eltime) * 1000000);
>>
>>     // print elapsed time in a human-readable format:
>>     int eldays = eltime / secs_per_day;
>> @@ -1029,7 +1028,7 @@
>>     int elmins = (eltime - day_secs - hour_secs) / secs_per_min;
>>     int minute_secs = elmins * secs_per_min;
>>     int elsecs = (eltime - day_secs - hour_secs - minute_secs);
>> -  st->print_cr(" elapsed time: %d seconds (%dd %dh %dm %ds)", 
>> eltime, eldays, elhours, elmins, elsecs);
>> +  st->print_cr(" elapsed time: %d.%06d seconds (%dd %dh %dm %ds)", 
>> eltime, eltimeFraction, eldays, elhours, elmins, elsecs);
>>   }
>>
>>
>>
>> On 05/03/2020 00:57, David Holmes wrote:
>>> On 4/03/2020 8:44 am, Kevin Walls wrote:
>>>>
>>>> Thanks David -
>>>>
>>>> Yes there are situations where hs_err fails, and few people are 
>>>> sadder than me
>>>> when that happens 8-) , so I was thinking about how scared to be by 
>>>> the comment.
>>>>
>>>> With the safety net of the error handler for the steps of the 
>>>> hs_err file
>>>> (which works well, we see it invoked frequently), it looks 
>>>> reasonable to use
>>>> %f as we might do other slightly questionable things for a signal 
>>>> handler.
>>>>
>>>> Corrupting locale information or floating point state might 
>>>> possibly cause
>>>> problems, but if I cause a fake crash in print_date_and_time the error
>>>> handler recovers and the report continues.
>>>
>>> That is good to know.
>>>
>>>> Thinking about printing with two ints, seconds and fractions:
>>>> I don't see anything already that returns such a time in two 
>>>> components in the
>>>> JVM, so we might implement a new form of os::javaTimeNanos() or 
>>>> similar that
>>>> returns the two parts, and do that on each platform.
>>>
>>> I was thinking of something simple/crude ...
>>>
>>>> I didn't yet come up with anything to do in os::print_date_and_time()
>>>> which will take the fractional part of the double, and print just 
>>>> the fraction as an int, without using any library / %f facilities.
>>>
>>> ... just using e.g. (untested)
>>>
>>> double t = os::elapsedTime();
>>> int secs =  (int) t;
>>> int micros =  (int)((t - secs) * 100000);
>>> printf("%d.%d", secs, micros);
>>>
>>> with appropriate width specifiers to get the formatting right.
>>>
>>> Cheers,
>>> David
>>>
>>>>
>>>> If you're still concerned I could revisit these or some other idea.
>>>>
>>>> Genuine laugh out loud moment for me, I backported the elapsed time 
>>>> logging from
>>>> 6u4 to 5u19 (https://bugs.openjdk.java.net/browse/JDK-6447157) (2007).
>>>> (I said before jdk5 was created, I should have said before it was 
>>>> in mercurial.)
>>>>
>>>> Thanks
>>>> Kevin
>>>>
>>>>
>>>> On 03/03/2020 01:11, David Holmes wrote:
>>>>> Hi Kevin,
>>>>>
>>>>> On 2/03/2020 8:48 pm, Kevin Walls wrote:
>>>>>> Oops, and with the bug ID in the title and JBS link:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8240295
>>>>>>
>>>>>>
>>>>>> On 02/03/2020 10:47, Kevin Walls wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> (s11y and runtime opinions both relevant)
>>>>>>>
>>>>>>> A few times in the last month I've really wanted to compare the 
>>>>>>> Events logged in the hs_err file, and the time of the JVM's crash.
>>>>>>>
>>>>>>> "elapsed time" in hs_err is only accurate to one second, and has 
>>>>>>> been since before jdk5 was created.
>>>>>>>
>>>>>>> The diff below changes the format string and uses the 
>>>>>>> non-rounded time value (I don't see a need to change the other 
>>>>>>> integer arithmetic here), and we can enjoy hs_errs with detail 
>>>>>>> like:
>>>>>>>
>>>>>>> ...
>>>>>>> Time: Mon Mar  2 09:57:13 2020 UTC elapsed time: 5.490135 
>>>>>>> seconds (0d 0h 0m 5s)
>>>>>>> ...
>>>>>>>
>>>>>>> Thanks
>>>>>>> Kevin
>>>>>>>
>>>>>>>
>>>>>>> /jdk/open$ hg diff
>>>>>>> diff --git a/src/hotspot/share/runtime/os.cpp 
>>>>>>> b/src/hotspot/share/runtime/os.cpp
>>>>>>> --- a/src/hotspot/share/runtime/os.cpp
>>>>>>> +++ b/src/hotspot/share/runtime/os.cpp
>>>>>>> @@ -1016,9 +1016,8 @@
>>>>>>>    }
>>>>>>>
>>>>>>>    double t = os::elapsedTime();
>>>>>>> -  // NOTE: It tends to crash after a SEGV if we want to 
>>>>>>> printf("%f",...) in
>>>>>>> -  //       Linux. Must be a bug in glibc ? Workaround is to 
>>>>>>> round "t" to int
>>>>>>> -  //       before printf. We lost some precision, but who cares?
>>>>>>> +  // NOTE: a crash using printf("%f",...) on Linux was 
>>>>>>> historically noted here
>>>>>>> +  //       (before the jdk5 repo was created).
>>>>>
>>>>> Just because it is old doesn't mean it no longer applies. printf 
>>>>> is not async-signal-safe - we know that but we try to use it 
>>>>> anyway. Maybe %f is even less async-signal-safe?
>>>>>
>>>>> This may get through testing okay but cause problems with real 
>>>>> crashes in the field.
>>>>>
>>>>> What about breaking the time up into two ints: seconds and nanos?
>>>>>
>>>>> Cheers,
>>>>> David
>>>>> -----
>>>>>
>>>>>>>    int eltime = (int)t;  // elapsed time in seconds
>>>>>>>
>>>>>>>    // print elapsed time in a human-readable format:
>>>>>>> @@ -1029,7 +1028,7 @@
>>>>>>>    int elmins = (eltime - day_secs - hour_secs) / secs_per_min;
>>>>>>>    int minute_secs = elmins * secs_per_min;
>>>>>>>    int elsecs = (eltime - day_secs - hour_secs - minute_secs);
>>>>>>> -  st->print_cr(" elapsed time: %d seconds (%dd %dh %dm %ds)", 
>>>>>>> eltime, eldays, elhours, elmins, elsecs);
>>>>>>> +  st->print_cr(" elapsed time: %f seconds (%dd %dh %dm %ds)", 
>>>>>>> t, eldays, elhours, elmins, elsecs);
>>>>>>>  }
>>>>>>>
>>>>>>>



More information about the serviceability-dev mailing list