RFR(S): 8240295: hs_err elapsed time in seconds is not accurate enough

David Holmes david.holmes at oracle.com
Thu Mar 5 10:38:58 UTC 2020


Thanks Kevin. I think this is the less risky change and achieves the goal.

David

On 5/03/2020 8:00 pm, Kevin Walls wrote:
> Thanks -
> 
> I had tried some ideas in the simple fashion, and we can use %06d 
> formatting.... OK maybe such formatting is not as "bad" as %f...
> 
> (glibc parses the int width specified without allocation.  We provide 
> the output buffer, I don't think we will cause  vfprintf code to alloca 
> or malloc.)
> 
> I can offer a second version below that uses %d only.  Testing alongside 
> %f in the same line, it retains the same value and position, e.g.
> 
> Time: Thu Mar  5 08:57:50 2020 UTC elapsed time: f: 2.001065 int: 
> 2.001065 (raw int: 1065) seconds (0d 0h 0m 2s)
> 
> Output example from the hg diff below (not from the same run):
> 
> Time: Thu Mar  5 09:28:01 2020 UTC elapsed time: 2.000611 seconds (0d 0h 
> 0m 2s)
> 
> 
> Thanks!
> Kevin
> 
> 
> $ hg diff
> diff --git a/src/hotspot/share/runtime/os.cpp 
> b/src/hotspot/share/runtime/os.cpp
> --- a/src/hotspot/share/runtime/os.cpp
> +++ b/src/hotspot/share/runtime/os.cpp
> @@ -1016,10 +1016,9 @@
>     }
> 
>     double t = os::elapsedTime();
> -  // NOTE: It tends to crash after a SEGV if we want to 
> printf("%f",...) in
> -  //       Linux. Must be a bug in glibc ? Workaround is to round "t" 
> to int
> -  //       before printf. We lost some precision, but who cares?
> +  // NOTE: a crash using printf("%f",...) on Linux was historically 
> noted here.
>     int eltime = (int)t;  // elapsed time in seconds
> +  int eltimeFraction = (int) ((t - eltime) * 1000000);
> 
>     // print elapsed time in a human-readable format:
>     int eldays = eltime / secs_per_day;
> @@ -1029,7 +1028,7 @@
>     int elmins = (eltime - day_secs - hour_secs) / secs_per_min;
>     int minute_secs = elmins * secs_per_min;
>     int elsecs = (eltime - day_secs - hour_secs - minute_secs);
> -  st->print_cr(" elapsed time: %d seconds (%dd %dh %dm %ds)", eltime, 
> eldays, elhours, elmins, elsecs);
> +  st->print_cr(" elapsed time: %d.%06d seconds (%dd %dh %dm %ds)", 
> eltime, eltimeFraction, eldays, elhours, elmins, elsecs);
>   }
> 
> 
> 
> On 05/03/2020 00:57, David Holmes wrote:
>> On 4/03/2020 8:44 am, Kevin Walls wrote:
>>>
>>> Thanks David -
>>>
>>> Yes there are situations where hs_err fails, and few people are 
>>> sadder than me
>>> when that happens 8-) , so I was thinking about how scared to be by 
>>> the comment.
>>>
>>> With the safety net of the error handler for the steps of the hs_err 
>>> file
>>> (which works well, we see it invoked frequently), it looks reasonable 
>>> to use
>>> %f as we might do other slightly questionable things for a signal 
>>> handler.
>>>
>>> Corrupting locale information or floating point state might possibly 
>>> cause
>>> problems, but if I cause a fake crash in print_date_and_time the error
>>> handler recovers and the report continues.
>>
>> That is good to know.
>>
>>> Thinking about printing with two ints, seconds and fractions:
>>> I don't see anything already that returns such a time in two 
>>> components in the
>>> JVM, so we might implement a new form of os::javaTimeNanos() or 
>>> similar that
>>> returns the two parts, and do that on each platform.
>>
>> I was thinking of something simple/crude ...
>>
>>> I didn't yet come up with anything to do in os::print_date_and_time()
>>> which will take the fractional part of the double, and print just the 
>>> fraction as an int, without using any library / %f facilities.
>>
>> ... just using e.g. (untested)
>>
>> double t = os::elapsedTime();
>> int secs =  (int) t;
>> int micros =  (int)((t - secs) * 100000);
>> printf("%d.%d", secs, micros);
>>
>> with appropriate width specifiers to get the formatting right.
>>
>> Cheers,
>> David
>>
>>>
>>> If you're still concerned I could revisit these or some other idea.
>>>
>>> Genuine laugh out loud moment for me, I backported the elapsed time 
>>> logging from
>>> 6u4 to 5u19  (https://bugs.openjdk.java.net/browse/JDK-6447157) (2007).
>>> (I said before jdk5 was created, I should have said before it was in 
>>> mercurial.)
>>>
>>> Thanks
>>> Kevin
>>>
>>>
>>> On 03/03/2020 01:11, David Holmes wrote:
>>>> Hi Kevin,
>>>>
>>>> On 2/03/2020 8:48 pm, Kevin Walls wrote:
>>>>> Oops, and with the bug ID in the title and JBS link:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8240295
>>>>>
>>>>>
>>>>> On 02/03/2020 10:47, Kevin Walls wrote:
>>>>>> Hi,
>>>>>>
>>>>>> (s11y and runtime opinions both relevant)
>>>>>>
>>>>>> A few times in the last month I've really wanted to compare the 
>>>>>> Events logged in the hs_err file, and the time of the JVM's crash.
>>>>>>
>>>>>> "elapsed time" in hs_err is only accurate to one second, and has 
>>>>>> been since before jdk5 was created.
>>>>>>
>>>>>> The diff below changes the format string and uses the non-rounded 
>>>>>> time value (I don't see a need to change the other integer 
>>>>>> arithmetic here), and we can enjoy hs_errs with detail like:
>>>>>>
>>>>>> ...
>>>>>> Time: Mon Mar  2 09:57:13 2020 UTC elapsed time: 5.490135 seconds 
>>>>>> (0d 0h 0m 5s)
>>>>>> ...
>>>>>>
>>>>>> Thanks
>>>>>> Kevin
>>>>>>
>>>>>>
>>>>>> /jdk/open$ hg diff
>>>>>> diff --git a/src/hotspot/share/runtime/os.cpp 
>>>>>> b/src/hotspot/share/runtime/os.cpp
>>>>>> --- a/src/hotspot/share/runtime/os.cpp
>>>>>> +++ b/src/hotspot/share/runtime/os.cpp
>>>>>> @@ -1016,9 +1016,8 @@
>>>>>>    }
>>>>>>
>>>>>>    double t = os::elapsedTime();
>>>>>> -  // NOTE: It tends to crash after a SEGV if we want to 
>>>>>> printf("%f",...) in
>>>>>> -  //       Linux. Must be a bug in glibc ? Workaround is to round 
>>>>>> "t" to int
>>>>>> -  //       before printf. We lost some precision, but who cares?
>>>>>> +  // NOTE: a crash using printf("%f",...) on Linux was 
>>>>>> historically noted here
>>>>>> +  //       (before the jdk5 repo was created).
>>>>
>>>> Just because it is old doesn't mean it no longer applies. printf is 
>>>> not async-signal-safe - we know that but we try to use it anyway. 
>>>> Maybe %f is even less async-signal-safe?
>>>>
>>>> This may get through testing okay but cause problems with real 
>>>> crashes in the field.
>>>>
>>>> What about breaking the time up into two ints: seconds and nanos?
>>>>
>>>> Cheers,
>>>> David
>>>> -----
>>>>
>>>>>>    int eltime = (int)t;  // elapsed time in seconds
>>>>>>
>>>>>>    // print elapsed time in a human-readable format:
>>>>>> @@ -1029,7 +1028,7 @@
>>>>>>    int elmins = (eltime - day_secs - hour_secs) / secs_per_min;
>>>>>>    int minute_secs = elmins * secs_per_min;
>>>>>>    int elsecs = (eltime - day_secs - hour_secs - minute_secs);
>>>>>> -  st->print_cr(" elapsed time: %d seconds (%dd %dh %dm %ds)", 
>>>>>> eltime, eldays, elhours, elmins, elsecs);
>>>>>> +  st->print_cr(" elapsed time: %f seconds (%dd %dh %dm %ds)", t, 
>>>>>> eldays, elhours, elmins, elsecs);
>>>>>>  }
>>>>>>
>>>>>>


More information about the serviceability-dev mailing list