RFR(S): 8240295: hs_err elapsed time in seconds is not accurate enough

David Holmes david.holmes at oracle.com
Thu Mar 5 00:57:35 UTC 2020


On 4/03/2020 8:44 am, Kevin Walls wrote:
> 
> Thanks David -
> 
> Yes there are situations where hs_err fails, and few people are sadder 
> than me
> when that happens 8-) , so I was thinking about how scared to be by the 
> comment.
> 
> With the safety net of the error handler for the steps of the hs_err file
> (which works well, we see it invoked frequently), it looks reasonable to 
> use
> %f as we might do other slightly questionable things for a signal handler.
> 
> Corrupting locale information or floating point state might possibly cause
> problems, but if I cause a fake crash in print_date_and_time the error
> handler recovers and the report continues.

That is good to know.

> Thinking about printing with two ints, seconds and fractions:
> I don't see anything already that returns such a time in two components 
> in the
> JVM, so we might implement a new form of os::javaTimeNanos() or similar 
> that
> returns the two parts, and do that on each platform.

I was thinking of something simple/crude ...

> I didn't yet come up with anything to do in os::print_date_and_time()
> which will take the fractional part of the double, and print just the 
> fraction as an int, without using any library / %f facilities.

... just using e.g. (untested)

double t = os::elapsedTime();
int secs =  (int) t;
int micros =  (int)((t - secs) * 100000);
printf("%d.%d", secs, micros);

with appropriate width specifiers to get the formatting right.

Cheers,
David

> 
> If you're still concerned I could revisit these or some other idea.
> 
> Genuine laugh out loud moment for me, I backported the elapsed time 
> logging from
> 6u4 to 5u19  (https://bugs.openjdk.java.net/browse/JDK-6447157) (2007).
> (I said before jdk5 was created, I should have said before it was in 
> mercurial.)
> 
> Thanks
> Kevin
> 
> 
> On 03/03/2020 01:11, David Holmes wrote:
>> Hi Kevin,
>>
>> On 2/03/2020 8:48 pm, Kevin Walls wrote:
>>> Oops, and with the bug ID in the title and JBS link:
>>> https://bugs.openjdk.java.net/browse/JDK-8240295
>>>
>>>
>>> On 02/03/2020 10:47, Kevin Walls wrote:
>>>> Hi,
>>>>
>>>> (s11y and runtime opinions both relevant)
>>>>
>>>> A few times in the last month I've really wanted to compare the 
>>>> Events logged in the hs_err file, and the time of the JVM's crash.
>>>>
>>>> "elapsed time" in hs_err is only accurate to one second, and has 
>>>> been since before jdk5 was created.
>>>>
>>>> The diff below changes the format string and uses the non-rounded 
>>>> time value (I don't see a need to change the other integer 
>>>> arithmetic here), and we can enjoy hs_errs with detail like:
>>>>
>>>> ...
>>>> Time: Mon Mar  2 09:57:13 2020 UTC elapsed time: 5.490135 seconds 
>>>> (0d 0h 0m 5s)
>>>> ...
>>>>
>>>> Thanks
>>>> Kevin
>>>>
>>>>
>>>> /jdk/open$ hg diff
>>>> diff --git a/src/hotspot/share/runtime/os.cpp 
>>>> b/src/hotspot/share/runtime/os.cpp
>>>> --- a/src/hotspot/share/runtime/os.cpp
>>>> +++ b/src/hotspot/share/runtime/os.cpp
>>>> @@ -1016,9 +1016,8 @@
>>>>    }
>>>>
>>>>    double t = os::elapsedTime();
>>>> -  // NOTE: It tends to crash after a SEGV if we want to 
>>>> printf("%f",...) in
>>>> -  //       Linux. Must be a bug in glibc ? Workaround is to round 
>>>> "t" to int
>>>> -  //       before printf. We lost some precision, but who cares?
>>>> +  // NOTE: a crash using printf("%f",...) on Linux was historically 
>>>> noted here
>>>> +  //       (before the jdk5 repo was created).
>>
>> Just because it is old doesn't mean it no longer applies. printf is 
>> not async-signal-safe - we know that but we try to use it anyway. 
>> Maybe %f is even less async-signal-safe?
>>
>> This may get through testing okay but cause problems with real crashes 
>> in the field.
>>
>> What about breaking the time up into two ints: seconds and nanos?
>>
>> Cheers,
>> David
>> -----
>>
>>>>    int eltime = (int)t;  // elapsed time in seconds
>>>>
>>>>    // print elapsed time in a human-readable format:
>>>> @@ -1029,7 +1028,7 @@
>>>>    int elmins = (eltime - day_secs - hour_secs) / secs_per_min;
>>>>    int minute_secs = elmins * secs_per_min;
>>>>    int elsecs = (eltime - day_secs - hour_secs - minute_secs);
>>>> -  st->print_cr(" elapsed time: %d seconds (%dd %dh %dm %ds)", 
>>>> eltime, eldays, elhours, elmins, elsecs);
>>>> +  st->print_cr(" elapsed time: %f seconds (%dd %dh %dm %ds)", t, 
>>>> eldays, elhours, elmins, elsecs);
>>>>  }
>>>>
>>>>


More information about the serviceability-dev mailing list