RFR(s): 8148425: strerror() function is not thread-safe

David Holmes david.holmes at oracle.com
Tue Mar 15 12:31:23 UTC 2016


Hi Thomas,

On 15/03/2016 8:07 PM, Thomas Stüfe wrote:
> Hi David,
>
> sorry, I will have not the time to work on this this week. If the patch
> is acceptable to you in this form, I'd say we commit it, otherwise I'll
> take it up again some time later.

If Coleen is okay with this as-is I'll commit it in the  morning.

> About the suggested changes:
>
> - Merging both APIs into one, like suggested, is fine for me and I would
> do this in a follow-up.

Sure.

> - Handling unknown errno values: I do not like the idea of a fallback to
> ::strerror(). Writing the API comment on that one would be weird: "Is
> almost threadsafe", "will probably return a static constant string"...
> :) I would suggest either using the solution as it is now - only
> honouring POSIX errno values or returning "Unknown", or to expand them
> with OS specific lists, either in the OS independent form ("#ifdef EBLUB
> handle(EBLUB) #endif") or by expanding os::strerror() with platform
> dependend os::strerror_pd().
>
> I am pretty unemotional about either option but would prefer avoiding
> ::strerror() altogether.

A follow up on this okay. I'm unsure how best to handle it. Given how 
rarely we encounter errors I'm probably over-thinking it. ;)

Thanks,
David

> Kind Regards, Thomas
>
>
>
> On Mon, Mar 14, 2016 at 4:43 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     On 12/03/2016 6:22 PM, Thomas Stüfe wrote:
>
>         Hi David,
>
>         thanks again for your review. Here the new version:
>
>         http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.02/webrev/
>
>         And the diff to the last version:
>
>         http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01-to-02.txt
>
>         I fixed the indention, copyright comments and the typo.
>
>
>     All looks good - thanks.
>
>         Further comments inline.
>
>
>     Ditto :)
>
>
>         On Fri, Mar 11, 2016 at 8:04 AM, David Holmes
>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>         <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>> wrote:
>
>              Hi Thomas,
>
>              Thanks for persevering with this. Sorry for the delayed
>         response.
>
>              On 10/03/2016 2:58 AM, Thomas Stüfe wrote:
>
>                  Hi all,
>
>                  please take a look at this next version of the fix.
>
>         http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01/webrev/
>
>                  I am the first to admit that this is quite ugly :(, but
>         it is a
>                  pragmatic solution.
>
>                  Some details:
>
>                  - I added os::strerror() as a drop-in replacement for
>                  ::strerror() - it
>                  returns the text as before, but does not localize and
>         does not
>                  handle
>                  unknown error values; so it can be implemented using a
>         simple static
>                  string table and is always safe to use.
>
>
>              Ok. But I don't think we need os::lasterror() if we have
>         this. A
>              follow up clean-up is okay for that though.
>
>
>         I would love to keep this functionality, because I like printing
>         out the
>         short concise errno values, and there are a number of logging
>         changes
>         coming at least for AIX. It also does not cost much, as the
>         large string
>         table is already there.
>
>         If the second API does not suit you, would it be accepable for
>         you if I
>         hide this functionality behind a second parameter in strerror,
>         defaulting to false, like I did in my first version? E.g.
>         os::strerror(int e, bool short_name=false); ?
>
>
>     That was what I was thinking of yes. :)
>
>         But talking about clean-up, could we get rid of os::lasterror() ? I
>         think that function is a bad idea, and it fortunately is only
>         called in
>         two places, both just need strerror() and no Win32 error
>         handling anyway.
>
>
>     Sounds good. lasterror() is a fragile API as it is too easy to
>     introduce unexpected system calls between the call to lasterror and
>     the action you want to check for the error.
>
>         I also would like to remove os::get_last_error(). It returns
>         errno on
>         *nix, win32 error on Windows. Both are different things, plus
>         there is
>         also an errno on Windows The return value of this function is
>         useless
>         without knowing what the returned error number actually is, so
>         it is not
>         of much use.
>
>
>     Simplifying and streamlining all this sounds good.
>
>              Also it does seem to handle unknown error values.
>
>                  - I added os::errno_name() for those cases where one
>         would want the
>                  literalized errno value (e.g. "EINVAL") instead of the
>         full text.
>
>
>              Ok. For the unknown case can it return the numeric value
>         that it was
>              passed?
>
>
>         I would like to avoid this. This would make the function more
>         complicated.
>
>         I'd need to keep this thread-safe, so I either would have to
>         allocate
>         thread local storage for the return string somehow or,
>         alternatively,
>         keep a fixed string table a la [][] {"unknown (0)",
>         "unknown(1)"} for
>         the first n values or so.
>
>
>     Okay. I was thinking this would be simpler but of course we have to
>     convert to a string.
>
>     I note that linux defines ~133 errno values, and you have entries
>     for ~75 - are they all the POSIX ones?
>
>     I'm wondering whether we should have a fallback strategy for an
>     unknown (platform-specific) error by using the platform strerror
>     even though not thread-safe (the window of exposure is vastly reduced) ?
>
>         Also, ::strerror() behaves not consistently for the EINVAL case
>         on all
>         platforms anyway - on Linux and Bsd, it returns the numerical
>         value, but
>         on Solaris, Windows and AIX, it just returns "Unknown". Many
>         callsites I
>         saw take this already into account and print out the numerical errno
>         value in addition to the error string.
>
>
>     I doubt those callsites are not taking into account unknown values -
>     I think you will find they originally only reported the numerical
>     value and were then augmented to also print the strerror message.
>
>     Thanks,
>     David
>
>                  - I fixed all callsites in the hotspot which used
>         ::strerror to use
>                  os::strerror, or, in some few cases (mostly my own
>         coding for AIX),
>                  os::errno_name().
>
>
>              Bit subjective but okay - I personally don't care either
>         way. :)
>
>                  - To avoid including os.hpp in debug.hpp, I moved the
>                  implemntation of
>                  assert_status to debug.cpp
>
>
>              Okay ... did the include cause a problem?
>
>
>         os.hpp is a large beast. I wanted to keep it out of debug.hpp,
>         to not
>         force this include upon anyone who includes debug.hpp just to get
>         assert(). I also try to avoid calling system apis in header files.
>
>
>
>              General minor comments:
>
>              src/os/posix/vm/os_posix.cpp
>
>              Indentation of wrapped call needs fixing
>
>
>         Fixed
>
>              ---
>
>              src/share/vm/runtime/os.hpp
>
>              Typo:
>
>              +   //  sugested in the POSIX standard.
>
>              -> suggested
>
>
>         Fixed
>
>              ---
>
>              src/share/vm/utilities/vmError.cpp
>
>              Good catch on the code that potentially reports the wrong error
>              using get_last_error! Even the simple print_cr(...,
>         strerror(errno),
>              errno) could in fact print non-matching values if strerror
>         itself
>              caused an error (unlikely of course :) ).
>
>
>         Thanks :)
>
>
>              Please check all copyright years -
>         attachListener_solaris.cpp was
>              updated to 2015 instead of 2016 :) Also note correct format
>         for two
>              year copyrights - in src/share/vm/logging/logFileOutput.cpp
>         you have:
>
>              Copyright (c) 2015,2016 Oracle
>
>              but it should be:
>
>              Copyright (c) 2015, 2016, Oracle
>
>         Fixed
>
>              Thanks,
>              David
>
>
>         Thanks!
>
>         Thomas
>
>
>                  Kind Regards, Thomas
>
>
>
>
>                  On Tue, Mar 8, 2016 at 11:01 PM, Coleen Phillimore
>                  <coleen.phillimore at oracle.com
>         <mailto:coleen.phillimore at oracle.com>
>                  <mailto:coleen.phillimore at oracle.com
>         <mailto:coleen.phillimore at oracle.com>>
>                  <mailto:coleen.phillimore at oracle.com
>         <mailto:coleen.phillimore at oracle.com>
>
>                  <mailto:coleen.phillimore at oracle.com
>         <mailto:coleen.phillimore at oracle.com>>>> wrote:
>
>
>                       This latest version seems okay (or at least better
>         than the
>                  other
>                       workarounds).
>
>                       thanks,
>                       Coleen
>
>
>                       On 3/1/16 12:30 AM, David Holmes wrote:
>
>                           Hi Thomas,
>
>                           On 1/03/2016 3:23 AM, Thomas Stüfe wrote:
>
>                               Hi David,
>
>                               On Mon, Feb 29, 2016 at 4:44 AM, David Holmes
>                               <david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>
>                  <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>
>         <mailto:david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>                  <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>>
>                               <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>
>                  <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>
>
>                               <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>
>                  <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>>>> wrote:
>
>                                    Hi Thomas,
>
>                                    On 27/02/2016 2:05 AM, Thomas Stüfe
>         wrote:
>
>                                        Hi,
>
>                                        please take a look at this
>         proposed fix:
>
>                                        Bug:
>         https://bugs.openjdk.java.net/browse/JDK-8148425
>                                        Webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.00/webrev/
>
>                                        This adds a replacement function
>                  os::strerror() as
>                               a drop-in for
>                                        strerror(), which has a number of
>         issues.
>
>                                        strerror() is unsafe to use and
>         may cause
>                  (and has
>                               caused)
>                                        crashes in
>                                        multithreaded scenarios. It is
>         also not
>                  ideal for
>                               log files
>                                        because of the
>                                        implicit localization of the
>         error messages.
>
>                                        For details please see the discussion
>                  under the bug
>                               report.
>
>
>                                    I just came across strerror_l, which is
>                  required to be
>                               thread-safe.
>                                    Is that a possible alternative? (I'm
>         not sure how
>                               locale's are
>                                    obtained).
>
>
>                               Sorry, I think this API is glibc only. At
>         least I
>                  cannot
>                               find this in
>                               our AIX headers, nor on Solaris.
>
>
>                           It is a POSIX API:
>
>         http://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror_l.html
>
>                           added in 2006 but part of POSIX.1-2008. But as per
>                  Dalibor's
>                           link not necessarily available:
>
>                           "This function is missing on some platforms: glibc
>                  2.3.6, Mac OS
>                           X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8,
>         Minix
>                  3.1.8, AIX
>                           5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11
>         2011-11,
>                  Cygwin,
>                           mingw, MSVC 9, Interix 3.5, BeOS. "
>
>                           Pity.
>
>
>                                    Otherwise what you have seems okay -
>         though I do
>                               dislike having to
>                                    duplicate all that details already
>         buried in
>                  the system
>                                    headers/library. Not sure we need to
>         the long
>                  text at
>                               the VM level -
>                                    which would simplify things a bit.
>
>
>                               I agree, I dislike this too. Like everyone
>         else in this
>                               thread. But I
>                               think this is a pragmatic solution.
>
>                               I am a bit stuck here - should we really
>         get rid of
>                  the long
>                               text
>                               feature? There are some callsites of
>         strerror() in the
>                               hotspot where
>                               arguably the long text is better suited:
>                               - in assert_status() (see debug.hpp) -
>         this ends up
>                  in the
>                               header of
>                               error files, if this suddenly changes to a
>         literalized
>                               errno, people may
>                               be upset
>
>
>                           I added assert_status purely to expand on what
>         the failing
>                           status was :) I'm happy to see EINVAL in that
>         context
>                  rather
>                           than "invalid argument" (or whatever) :)
>
>                               - when failing to write a heap dump file - see
>                               services/heapDumper.cpp.
>                               Which ends as printout on the command
>         line, I think.
>
>
>                           Maybe ... if it is an error the user can fix.
>
>                               The safe option would be to provide both
>         variants
>                  (short and
>                               long text).
>                               Or, provide the safe, short variant for
>         all logging
>                  calls - when
>                               "EINVAL" is enough, and let users continue
>         to use
>                  strerror()
>                               for those
>                               few cases where the long text is needed.
>
>                               What do you think?
>
>
>                           Trying to decide what makes me least unhappy.
>         :) Given
>                  you've
>                           already done the work I guess having long and
>         short
>                  forms is okay.
>
>                           Thanks,
>                           David
>
>                               Thanks, Thomas
>
>                                    Thanks,
>                                    David
>
>
>                                        Please note that I did not yet
>         change any call
>                               sites, although
>                                        all call
>                                        sites in the os namespace should
>         already
>                  use the
>                               new function. I
>                                        wanted to
>                                        see whether there would be any
>         general
>                  objections.
>
>                                        Kind Regards, Thomas
>
>
>
>
>
>


More information about the hotspot-runtime-dev mailing list