RFR(s): 8148425: strerror() function is not thread-safe

Thomas Stüfe thomas.stuefe at gmail.com
Tue Mar 15 10:07:59 UTC 2016


Hi David,

sorry, I will have not the time to work on this this week. If the patch is
acceptable to you in this form, I'd say we commit it, otherwise I'll take
it up again some time later.

About the suggested changes:

- Merging both APIs into one, like suggested, is fine for me and I would do
this in a follow-up.

- Handling unknown errno values: I do not like the idea of a fallback to
::strerror(). Writing the API comment on that one would be weird: "Is
almost threadsafe", "will probably return a static constant string"... :) I
would suggest either using the solution as it is now - only honouring POSIX
errno values or returning "Unknown", or to expand them with OS specific
lists, either in the OS independent form ("#ifdef EBLUB handle(EBLUB)
#endif") or by expanding os::strerror() with platform dependend
os::strerror_pd().

I am pretty unemotional about either option but would prefer avoiding
::strerror() altogether.

Kind Regards, Thomas



On Mon, Mar 14, 2016 at 4:43 AM, David Holmes <david.holmes at oracle.com>
wrote:

> On 12/03/2016 6:22 PM, Thomas Stüfe wrote:
>
>> Hi David,
>>
>> thanks again for your review. Here the new version:
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.02/webrev/
>>
>> And the diff to the last version:
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01-to-02.txt
>>
>> I fixed the indention, copyright comments and the typo.
>>
>
> All looks good - thanks.
>
> Further comments inline.
>>
>
> Ditto :)
>
>
>> On Fri, Mar 11, 2016 at 8:04 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     Thanks for persevering with this. Sorry for the delayed response.
>>
>>     On 10/03/2016 2:58 AM, Thomas Stüfe wrote:
>>
>>         Hi all,
>>
>>         please take a look at this next version of the fix.
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01/webrev/
>>
>>         I am the first to admit that this is quite ugly :(, but it is a
>>         pragmatic solution.
>>
>>         Some details:
>>
>>         - I added os::strerror() as a drop-in replacement for
>>         ::strerror() - it
>>         returns the text as before, but does not localize and does not
>>         handle
>>         unknown error values; so it can be implemented using a simple
>> static
>>         string table and is always safe to use.
>>
>>
>>     Ok. But I don't think we need os::lasterror() if we have this. A
>>     follow up clean-up is okay for that though.
>>
>>
>> I would love to keep this functionality, because I like printing out the
>> short concise errno values, and there are a number of logging changes
>> coming at least for AIX. It also does not cost much, as the large string
>> table is already there.
>>
>> If the second API does not suit you, would it be accepable for you if I
>> hide this functionality behind a second parameter in strerror,
>> defaulting to false, like I did in my first version? E.g.
>> os::strerror(int e, bool short_name=false); ?
>>
>
> That was what I was thinking of yes. :)
>
> But talking about clean-up, could we get rid of os::lasterror() ? I
>> think that function is a bad idea, and it fortunately is only called in
>> two places, both just need strerror() and no Win32 error handling anyway.
>>
>
> Sounds good. lasterror() is a fragile API as it is too easy to introduce
> unexpected system calls between the call to lasterror and the action you
> want to check for the error.
>
> I also would like to remove os::get_last_error(). It returns errno on
>> *nix, win32 error on Windows. Both are different things, plus there is
>> also an errno on Windows The return value of this function is useless
>> without knowing what the returned error number actually is, so it is not
>> of much use.
>>
>
> Simplifying and streamlining all this sounds good.
>
>     Also it does seem to handle unknown error values.
>>
>>         - I added os::errno_name() for those cases where one would want
>> the
>>         literalized errno value (e.g. "EINVAL") instead of the full text.
>>
>>
>>     Ok. For the unknown case can it return the numeric value that it was
>>     passed?
>>
>>
>> I would like to avoid this. This would make the function more complicated.
>>
>> I'd need to keep this thread-safe, so I either would have to allocate
>> thread local storage for the return string somehow or, alternatively,
>> keep a fixed string table a la [][] {"unknown (0)", "unknown(1)"} for
>> the first n values or so.
>>
>
> Okay. I was thinking this would be simpler but of course we have to
> convert to a string.
>
> I note that linux defines ~133 errno values, and you have entries for ~75
> - are they all the POSIX ones?
>
> I'm wondering whether we should have a fallback strategy for an unknown
> (platform-specific) error by using the platform strerror even though not
> thread-safe (the window of exposure is vastly reduced) ?
>
> Also, ::strerror() behaves not consistently for the EINVAL case on all
>> platforms anyway - on Linux and Bsd, it returns the numerical value, but
>> on Solaris, Windows and AIX, it just returns "Unknown". Many callsites I
>> saw take this already into account and print out the numerical errno
>> value in addition to the error string.
>>
>
> I doubt those callsites are not taking into account unknown values - I
> think you will find they originally only reported the numerical value and
> were then augmented to also print the strerror message.
>
> Thanks,
> David
>
>         - I fixed all callsites in the hotspot which used ::strerror to use
>>         os::strerror, or, in some few cases (mostly my own coding for
>> AIX),
>>         os::errno_name().
>>
>>
>>     Bit subjective but okay - I personally don't care either way. :)
>>
>>         - To avoid including os.hpp in debug.hpp, I moved the
>>         implemntation of
>>         assert_status to debug.cpp
>>
>>
>>     Okay ... did the include cause a problem?
>>
>>
>> os.hpp is a large beast. I wanted to keep it out of debug.hpp, to not
>> force this include upon anyone who includes debug.hpp just to get
>> assert(). I also try to avoid calling system apis in header files.
>>
>>
>>
>>     General minor comments:
>>
>>     src/os/posix/vm/os_posix.cpp
>>
>>     Indentation of wrapped call needs fixing
>>
>>
>> Fixed
>>
>>     ---
>>
>>     src/share/vm/runtime/os.hpp
>>
>>     Typo:
>>
>>     +   //  sugested in the POSIX standard.
>>
>>     -> suggested
>>
>>
>> Fixed
>>
>>     ---
>>
>>     src/share/vm/utilities/vmError.cpp
>>
>>     Good catch on the code that potentially reports the wrong error
>>     using get_last_error! Even the simple print_cr(..., strerror(errno),
>>     errno) could in fact print non-matching values if strerror itself
>>     caused an error (unlikely of course :) ).
>>
>>
>> Thanks :)
>>
>>
>>     Please check all copyright years - attachListener_solaris.cpp was
>>     updated to 2015 instead of 2016 :) Also note correct format for two
>>     year copyrights - in src/share/vm/logging/logFileOutput.cpp you have:
>>
>>     Copyright (c) 2015,2016 Oracle
>>
>>     but it should be:
>>
>>     Copyright (c) 2015, 2016, Oracle
>>
>> Fixed
>>
>>     Thanks,
>>     David
>>
>>
>> Thanks!
>>
>> Thomas
>>
>>
>>         Kind Regards, Thomas
>>
>>
>>
>>
>>         On Tue, Mar 8, 2016 at 11:01 PM, Coleen Phillimore
>>         <coleen.phillimore at oracle.com
>>         <mailto:coleen.phillimore at oracle.com>
>>         <mailto:coleen.phillimore at oracle.com
>>
>>         <mailto:coleen.phillimore at oracle.com>>> wrote:
>>
>>
>>              This latest version seems okay (or at least better than the
>>         other
>>              workarounds).
>>
>>              thanks,
>>              Coleen
>>
>>
>>              On 3/1/16 12:30 AM, David Holmes wrote:
>>
>>                  Hi Thomas,
>>
>>                  On 1/03/2016 3:23 AM, Thomas Stüfe wrote:
>>
>>                      Hi David,
>>
>>                      On Mon, Feb 29, 2016 at 4:44 AM, David Holmes
>>                      <david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com> <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>>
>>                      <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>
>>
>>                      <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>>>> wrote:
>>
>>                           Hi Thomas,
>>
>>                           On 27/02/2016 2:05 AM, Thomas Stüfe wrote:
>>
>>                               Hi,
>>
>>                               please take a look at this proposed fix:
>>
>>                               Bug:
>>         https://bugs.openjdk.java.net/browse/JDK-8148425
>>                               Webrev:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.00/webrev/
>>
>>                               This adds a replacement function
>>         os::strerror() as
>>                      a drop-in for
>>                               strerror(), which has a number of issues.
>>
>>                               strerror() is unsafe to use and may cause
>>         (and has
>>                      caused)
>>                               crashes in
>>                               multithreaded scenarios. It is also not
>>         ideal for
>>                      log files
>>                               because of the
>>                               implicit localization of the error messages.
>>
>>                               For details please see the discussion
>>         under the bug
>>                      report.
>>
>>
>>                           I just came across strerror_l, which is
>>         required to be
>>                      thread-safe.
>>                           Is that a possible alternative? (I'm not sure
>> how
>>                      locale's are
>>                           obtained).
>>
>>
>>                      Sorry, I think this API is glibc only. At least I
>>         cannot
>>                      find this in
>>                      our AIX headers, nor on Solaris.
>>
>>
>>                  It is a POSIX API:
>>
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror_l.html
>>
>>                  added in 2006 but part of POSIX.1-2008. But as per
>>         Dalibor's
>>                  link not necessarily available:
>>
>>                  "This function is missing on some platforms: glibc
>>         2.3.6, Mac OS
>>                  X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8, Minix
>>         3.1.8, AIX
>>                  5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11 2011-11,
>>         Cygwin,
>>                  mingw, MSVC 9, Interix 3.5, BeOS. "
>>
>>                  Pity.
>>
>>
>>                           Otherwise what you have seems okay - though I do
>>                      dislike having to
>>                           duplicate all that details already buried in
>>         the system
>>                           headers/library. Not sure we need to the long
>>         text at
>>                      the VM level -
>>                           which would simplify things a bit.
>>
>>
>>                      I agree, I dislike this too. Like everyone else in
>> this
>>                      thread. But I
>>                      think this is a pragmatic solution.
>>
>>                      I am a bit stuck here - should we really get rid of
>>         the long
>>                      text
>>                      feature? There are some callsites of strerror() in
>> the
>>                      hotspot where
>>                      arguably the long text is better suited:
>>                      - in assert_status() (see debug.hpp) - this ends up
>>         in the
>>                      header of
>>                      error files, if this suddenly changes to a
>> literalized
>>                      errno, people may
>>                      be upset
>>
>>
>>                  I added assert_status purely to expand on what the
>> failing
>>                  status was :) I'm happy to see EINVAL in that context
>>         rather
>>                  than "invalid argument" (or whatever) :)
>>
>>                      - when failing to write a heap dump file - see
>>                      services/heapDumper.cpp.
>>                      Which ends as printout on the command line, I think.
>>
>>
>>                  Maybe ... if it is an error the user can fix.
>>
>>                      The safe option would be to provide both variants
>>         (short and
>>                      long text).
>>                      Or, provide the safe, short variant for all logging
>>         calls - when
>>                      "EINVAL" is enough, and let users continue to use
>>         strerror()
>>                      for those
>>                      few cases where the long text is needed.
>>
>>                      What do you think?
>>
>>
>>                  Trying to decide what makes me least unhappy. :) Given
>>         you've
>>                  already done the work I guess having long and short
>>         forms is okay.
>>
>>                  Thanks,
>>                  David
>>
>>                      Thanks, Thomas
>>
>>                           Thanks,
>>                           David
>>
>>
>>                               Please note that I did not yet change any
>> call
>>                      sites, although
>>                               all call
>>                               sites in the os namespace should already
>>         use the
>>                      new function. I
>>                               wanted to
>>                               see whether there would be any general
>>         objections.
>>
>>                               Kind Regards, Thomas
>>
>>
>>
>>
>>
>>


More information about the hotspot-runtime-dev mailing list