RFR(S): 8240295: hs_err elapsed time in seconds is not accurate enough

Kevin Walls kevin.walls at oracle.com
Thu Mar 5 10:00:24 UTC 2020


Thanks -

I had tried some ideas in the simple fashion, and we can use %06d 
formatting.... OK maybe such formatting is not as "bad" as %f...

(glibc parses the int width specified without allocation.  We provide 
the output buffer, I don't think we will cause  vfprintf code to alloca 
or malloc.)

I can offer a second version below that uses %d only.  Testing alongside 
%f in the same line, it retains the same value and position, e.g.

Time: Thu Mar  5 08:57:50 2020 UTC elapsed time: f: 2.001065 int: 
2.001065 (raw int: 1065) seconds (0d 0h 0m 2s)

Output example from the hg diff below (not from the same run):

Time: Thu Mar  5 09:28:01 2020 UTC elapsed time: 2.000611 seconds (0d 0h 
0m 2s)


Thanks!
Kevin


$ hg diff
diff --git a/src/hotspot/share/runtime/os.cpp 
b/src/hotspot/share/runtime/os.cpp
--- a/src/hotspot/share/runtime/os.cpp
+++ b/src/hotspot/share/runtime/os.cpp
@@ -1016,10 +1016,9 @@
    }

    double t = os::elapsedTime();
-  // NOTE: It tends to crash after a SEGV if we want to printf("%f",...) in
-  //       Linux. Must be a bug in glibc ? Workaround is to round "t" 
to int
-  //       before printf. We lost some precision, but who cares?
+  // NOTE: a crash using printf("%f",...) on Linux was historically 
noted here.
    int eltime = (int)t;  // elapsed time in seconds
+  int eltimeFraction = (int) ((t - eltime) * 1000000);

    // print elapsed time in a human-readable format:
    int eldays = eltime / secs_per_day;
@@ -1029,7 +1028,7 @@
    int elmins = (eltime - day_secs - hour_secs) / secs_per_min;
    int minute_secs = elmins * secs_per_min;
    int elsecs = (eltime - day_secs - hour_secs - minute_secs);
-  st->print_cr(" elapsed time: %d seconds (%dd %dh %dm %ds)", eltime, 
eldays, elhours, elmins, elsecs);
+  st->print_cr(" elapsed time: %d.%06d seconds (%dd %dh %dm %ds)", 
eltime, eltimeFraction, eldays, elhours, elmins, elsecs);
  }



On 05/03/2020 00:57, David Holmes wrote:
> On 4/03/2020 8:44 am, Kevin Walls wrote:
>>
>> Thanks David -
>>
>> Yes there are situations where hs_err fails, and few people are 
>> sadder than me
>> when that happens 8-) , so I was thinking about how scared to be by 
>> the comment.
>>
>> With the safety net of the error handler for the steps of the hs_err 
>> file
>> (which works well, we see it invoked frequently), it looks reasonable 
>> to use
>> %f as we might do other slightly questionable things for a signal 
>> handler.
>>
>> Corrupting locale information or floating point state might possibly 
>> cause
>> problems, but if I cause a fake crash in print_date_and_time the error
>> handler recovers and the report continues.
>
> That is good to know.
>
>> Thinking about printing with two ints, seconds and fractions:
>> I don't see anything already that returns such a time in two 
>> components in the
>> JVM, so we might implement a new form of os::javaTimeNanos() or 
>> similar that
>> returns the two parts, and do that on each platform.
>
> I was thinking of something simple/crude ...
>
>> I didn't yet come up with anything to do in os::print_date_and_time()
>> which will take the fractional part of the double, and print just the 
>> fraction as an int, without using any library / %f facilities.
>
> ... just using e.g. (untested)
>
> double t = os::elapsedTime();
> int secs =  (int) t;
> int micros =  (int)((t - secs) * 100000);
> printf("%d.%d", secs, micros);
>
> with appropriate width specifiers to get the formatting right.
>
> Cheers,
> David
>
>>
>> If you're still concerned I could revisit these or some other idea.
>>
>> Genuine laugh out loud moment for me, I backported the elapsed time 
>> logging from
>> 6u4 to 5u19  (https://bugs.openjdk.java.net/browse/JDK-6447157) (2007).
>> (I said before jdk5 was created, I should have said before it was in 
>> mercurial.)
>>
>> Thanks
>> Kevin
>>
>>
>> On 03/03/2020 01:11, David Holmes wrote:
>>> Hi Kevin,
>>>
>>> On 2/03/2020 8:48 pm, Kevin Walls wrote:
>>>> Oops, and with the bug ID in the title and JBS link:
>>>> https://bugs.openjdk.java.net/browse/JDK-8240295
>>>>
>>>>
>>>> On 02/03/2020 10:47, Kevin Walls wrote:
>>>>> Hi,
>>>>>
>>>>> (s11y and runtime opinions both relevant)
>>>>>
>>>>> A few times in the last month I've really wanted to compare the 
>>>>> Events logged in the hs_err file, and the time of the JVM's crash.
>>>>>
>>>>> "elapsed time" in hs_err is only accurate to one second, and has 
>>>>> been since before jdk5 was created.
>>>>>
>>>>> The diff below changes the format string and uses the non-rounded 
>>>>> time value (I don't see a need to change the other integer 
>>>>> arithmetic here), and we can enjoy hs_errs with detail like:
>>>>>
>>>>> ...
>>>>> Time: Mon Mar  2 09:57:13 2020 UTC elapsed time: 5.490135 seconds 
>>>>> (0d 0h 0m 5s)
>>>>> ...
>>>>>
>>>>> Thanks
>>>>> Kevin
>>>>>
>>>>>
>>>>> /jdk/open$ hg diff
>>>>> diff --git a/src/hotspot/share/runtime/os.cpp 
>>>>> b/src/hotspot/share/runtime/os.cpp
>>>>> --- a/src/hotspot/share/runtime/os.cpp
>>>>> +++ b/src/hotspot/share/runtime/os.cpp
>>>>> @@ -1016,9 +1016,8 @@
>>>>>    }
>>>>>
>>>>>    double t = os::elapsedTime();
>>>>> -  // NOTE: It tends to crash after a SEGV if we want to 
>>>>> printf("%f",...) in
>>>>> -  //       Linux. Must be a bug in glibc ? Workaround is to round 
>>>>> "t" to int
>>>>> -  //       before printf. We lost some precision, but who cares?
>>>>> +  // NOTE: a crash using printf("%f",...) on Linux was 
>>>>> historically noted here
>>>>> +  //       (before the jdk5 repo was created).
>>>
>>> Just because it is old doesn't mean it no longer applies. printf is 
>>> not async-signal-safe - we know that but we try to use it anyway. 
>>> Maybe %f is even less async-signal-safe?
>>>
>>> This may get through testing okay but cause problems with real 
>>> crashes in the field.
>>>
>>> What about breaking the time up into two ints: seconds and nanos?
>>>
>>> Cheers,
>>> David
>>> -----
>>>
>>>>>    int eltime = (int)t;  // elapsed time in seconds
>>>>>
>>>>>    // print elapsed time in a human-readable format:
>>>>> @@ -1029,7 +1028,7 @@
>>>>>    int elmins = (eltime - day_secs - hour_secs) / secs_per_min;
>>>>>    int minute_secs = elmins * secs_per_min;
>>>>>    int elsecs = (eltime - day_secs - hour_secs - minute_secs);
>>>>> -  st->print_cr(" elapsed time: %d seconds (%dd %dh %dm %ds)", 
>>>>> eltime, eldays, elhours, elmins, elsecs);
>>>>> +  st->print_cr(" elapsed time: %f seconds (%dd %dh %dm %ds)", t, 
>>>>> eldays, elhours, elmins, elsecs);
>>>>>  }
>>>>>
>>>>>


More information about the serviceability-dev mailing list