RFR(s): 8148425: strerror() function is not thread-safe

Coleen Phillimore coleen.phillimore at oracle.com
Tue Mar 15 15:02:37 UTC 2016


This looks fine to me, as-is.

On 3/15/16 8:31 AM, David Holmes wrote:
> Hi Thomas,
>
> On 15/03/2016 8:07 PM, Thomas Stüfe wrote:
>> Hi David,
>>
>> sorry, I will have not the time to work on this this week. If the patch
>> is acceptable to you in this form, I'd say we commit it, otherwise I'll
>> take it up again some time later.
>
> If Coleen is okay with this as-is I'll commit it in the  morning.
>
>> About the suggested changes:
>>
>> - Merging both APIs into one, like suggested, is fine for me and I would
>> do this in a follow-up.
>
> Sure.
>
>> - Handling unknown errno values: I do not like the idea of a fallback to
>> ::strerror(). Writing the API comment on that one would be weird: "Is
>> almost threadsafe", "will probably return a static constant string"...
>> :) I would suggest either using the solution as it is now - only
>> honouring POSIX errno values or returning "Unknown", or to expand them
>> with OS specific lists, either in the OS independent form ("#ifdef EBLUB
>> handle(EBLUB) #endif") or by expanding os::strerror() with platform
>> dependend os::strerror_pd().
>>
>> I am pretty unemotional about either option but would prefer avoiding
>> ::strerror() altogether.
>
> A follow up on this okay. I'm unsure how best to handle it. Given how 
> rarely we encounter errors I'm probably over-thinking it. ;)
>

Yeah, I think you're over-thinking it.  The two APIs with comments make 
sense to me, and something returning Unknown isn't a big deal in real life.

Thanks for the change and thanks for sponsoring this, David.

Coleen

> Thanks,
> David
>
>> Kind Regards, Thomas
>>
>>
>>
>> On Mon, Mar 14, 2016 at 4:43 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     On 12/03/2016 6:22 PM, Thomas Stüfe wrote:
>>
>>         Hi David,
>>
>>         thanks again for your review. Here the new version:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.02/webrev/
>>
>>         And the diff to the last version:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01-to-02.txt
>>
>>         I fixed the indention, copyright comments and the typo.
>>
>>
>>     All looks good - thanks.
>>
>>         Further comments inline.
>>
>>
>>     Ditto :)
>>
>>
>>         On Fri, Mar 11, 2016 at 8:04 AM, David Holmes
>>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>         <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>>> wrote:
>>
>>              Hi Thomas,
>>
>>              Thanks for persevering with this. Sorry for the delayed
>>         response.
>>
>>              On 10/03/2016 2:58 AM, Thomas Stüfe wrote:
>>
>>                  Hi all,
>>
>>                  please take a look at this next version of the fix.
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01/webrev/
>>
>>                  I am the first to admit that this is quite ugly :(, but
>>         it is a
>>                  pragmatic solution.
>>
>>                  Some details:
>>
>>                  - I added os::strerror() as a drop-in replacement for
>>                  ::strerror() - it
>>                  returns the text as before, but does not localize and
>>         does not
>>                  handle
>>                  unknown error values; so it can be implemented using a
>>         simple static
>>                  string table and is always safe to use.
>>
>>
>>              Ok. But I don't think we need os::lasterror() if we have
>>         this. A
>>              follow up clean-up is okay for that though.
>>
>>
>>         I would love to keep this functionality, because I like printing
>>         out the
>>         short concise errno values, and there are a number of logging
>>         changes
>>         coming at least for AIX. It also does not cost much, as the
>>         large string
>>         table is already there.
>>
>>         If the second API does not suit you, would it be accepable for
>>         you if I
>>         hide this functionality behind a second parameter in strerror,
>>         defaulting to false, like I did in my first version? E.g.
>>         os::strerror(int e, bool short_name=false); ?
>>
>>
>>     That was what I was thinking of yes. :)
>>
>>         But talking about clean-up, could we get rid of 
>> os::lasterror() ? I
>>         think that function is a bad idea, and it fortunately is only
>>         called in
>>         two places, both just need strerror() and no Win32 error
>>         handling anyway.
>>
>>
>>     Sounds good. lasterror() is a fragile API as it is too easy to
>>     introduce unexpected system calls between the call to lasterror and
>>     the action you want to check for the error.
>>
>>         I also would like to remove os::get_last_error(). It returns
>>         errno on
>>         *nix, win32 error on Windows. Both are different things, plus
>>         there is
>>         also an errno on Windows The return value of this function is
>>         useless
>>         without knowing what the returned error number actually is, so
>>         it is not
>>         of much use.
>>
>>
>>     Simplifying and streamlining all this sounds good.
>>
>>              Also it does seem to handle unknown error values.
>>
>>                  - I added os::errno_name() for those cases where one
>>         would want the
>>                  literalized errno value (e.g. "EINVAL") instead of the
>>         full text.
>>
>>
>>              Ok. For the unknown case can it return the numeric value
>>         that it was
>>              passed?
>>
>>
>>         I would like to avoid this. This would make the function more
>>         complicated.
>>
>>         I'd need to keep this thread-safe, so I either would have to
>>         allocate
>>         thread local storage for the return string somehow or,
>>         alternatively,
>>         keep a fixed string table a la [][] {"unknown (0)",
>>         "unknown(1)"} for
>>         the first n values or so.
>>
>>
>>     Okay. I was thinking this would be simpler but of course we have to
>>     convert to a string.
>>
>>     I note that linux defines ~133 errno values, and you have entries
>>     for ~75 - are they all the POSIX ones?
>>
>>     I'm wondering whether we should have a fallback strategy for an
>>     unknown (platform-specific) error by using the platform strerror
>>     even though not thread-safe (the window of exposure is vastly 
>> reduced) ?
>>
>>         Also, ::strerror() behaves not consistently for the EINVAL case
>>         on all
>>         platforms anyway - on Linux and Bsd, it returns the numerical
>>         value, but
>>         on Solaris, Windows and AIX, it just returns "Unknown". Many
>>         callsites I
>>         saw take this already into account and print out the 
>> numerical errno
>>         value in addition to the error string.
>>
>>
>>     I doubt those callsites are not taking into account unknown values -
>>     I think you will find they originally only reported the numerical
>>     value and were then augmented to also print the strerror message.
>>
>>     Thanks,
>>     David
>>
>>                  - I fixed all callsites in the hotspot which used
>>         ::strerror to use
>>                  os::strerror, or, in some few cases (mostly my own
>>         coding for AIX),
>>                  os::errno_name().
>>
>>
>>              Bit subjective but okay - I personally don't care either
>>         way. :)
>>
>>                  - To avoid including os.hpp in debug.hpp, I moved the
>>                  implemntation of
>>                  assert_status to debug.cpp
>>
>>
>>              Okay ... did the include cause a problem?
>>
>>
>>         os.hpp is a large beast. I wanted to keep it out of debug.hpp,
>>         to not
>>         force this include upon anyone who includes debug.hpp just to 
>> get
>>         assert(). I also try to avoid calling system apis in header 
>> files.
>>
>>
>>
>>              General minor comments:
>>
>>              src/os/posix/vm/os_posix.cpp
>>
>>              Indentation of wrapped call needs fixing
>>
>>
>>         Fixed
>>
>>              ---
>>
>>              src/share/vm/runtime/os.hpp
>>
>>              Typo:
>>
>>              +   //  sugested in the POSIX standard.
>>
>>              -> suggested
>>
>>
>>         Fixed
>>
>>              ---
>>
>>              src/share/vm/utilities/vmError.cpp
>>
>>              Good catch on the code that potentially reports the 
>> wrong error
>>              using get_last_error! Even the simple print_cr(...,
>>         strerror(errno),
>>              errno) could in fact print non-matching values if strerror
>>         itself
>>              caused an error (unlikely of course :) ).
>>
>>
>>         Thanks :)
>>
>>
>>              Please check all copyright years -
>>         attachListener_solaris.cpp was
>>              updated to 2015 instead of 2016 :) Also note correct format
>>         for two
>>              year copyrights - in src/share/vm/logging/logFileOutput.cpp
>>         you have:
>>
>>              Copyright (c) 2015,2016 Oracle
>>
>>              but it should be:
>>
>>              Copyright (c) 2015, 2016, Oracle
>>
>>         Fixed
>>
>>              Thanks,
>>              David
>>
>>
>>         Thanks!
>>
>>         Thomas
>>
>>
>>                  Kind Regards, Thomas
>>
>>
>>
>>
>>                  On Tue, Mar 8, 2016 at 11:01 PM, Coleen Phillimore
>>                  <coleen.phillimore at oracle.com
>>         <mailto:coleen.phillimore at oracle.com>
>>                  <mailto:coleen.phillimore at oracle.com
>>         <mailto:coleen.phillimore at oracle.com>>
>>                  <mailto:coleen.phillimore at oracle.com
>>         <mailto:coleen.phillimore at oracle.com>
>>
>>                  <mailto:coleen.phillimore at oracle.com
>>         <mailto:coleen.phillimore at oracle.com>>>> wrote:
>>
>>
>>                       This latest version seems okay (or at least better
>>         than the
>>                  other
>>                       workarounds).
>>
>>                       thanks,
>>                       Coleen
>>
>>
>>                       On 3/1/16 12:30 AM, David Holmes wrote:
>>
>>                           Hi Thomas,
>>
>>                           On 1/03/2016 3:23 AM, Thomas Stüfe wrote:
>>
>>                               Hi David,
>>
>>                               On Mon, Feb 29, 2016 at 4:44 AM, David 
>> Holmes
>>                               <david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>
>>                  <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>>
>>         <mailto:david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>                  <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>>>
>>                               <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>
>>                  <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>>
>>
>>                               <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>
>>                  <mailto:david.holmes at oracle.com
>>         <mailto:david.holmes at oracle.com>>>>> wrote:
>>
>>                                    Hi Thomas,
>>
>>                                    On 27/02/2016 2:05 AM, Thomas Stüfe
>>         wrote:
>>
>>                                        Hi,
>>
>>                                        please take a look at this
>>         proposed fix:
>>
>>                                        Bug:
>>         https://bugs.openjdk.java.net/browse/JDK-8148425
>>                                        Webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.00/webrev/
>>
>>                                        This adds a replacement function
>>                  os::strerror() as
>>                               a drop-in for
>>                                        strerror(), which has a number of
>>         issues.
>>
>>                                        strerror() is unsafe to use and
>>         may cause
>>                  (and has
>>                               caused)
>>                                        crashes in
>>                                        multithreaded scenarios. It is
>>         also not
>>                  ideal for
>>                               log files
>>                                        because of the
>>                                        implicit localization of the
>>         error messages.
>>
>>                                        For details please see the 
>> discussion
>>                  under the bug
>>                               report.
>>
>>
>>                                    I just came across strerror_l, 
>> which is
>>                  required to be
>>                               thread-safe.
>>                                    Is that a possible alternative? (I'm
>>         not sure how
>>                               locale's are
>>                                    obtained).
>>
>>
>>                               Sorry, I think this API is glibc only. At
>>         least I
>>                  cannot
>>                               find this in
>>                               our AIX headers, nor on Solaris.
>>
>>
>>                           It is a POSIX API:
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror_l.html
>>
>>                           added in 2006 but part of POSIX.1-2008. But 
>> as per
>>                  Dalibor's
>>                           link not necessarily available:
>>
>>                           "This function is missing on some 
>> platforms: glibc
>>                  2.3.6, Mac OS
>>                           X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8,
>>         Minix
>>                  3.1.8, AIX
>>                           5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11
>>         2011-11,
>>                  Cygwin,
>>                           mingw, MSVC 9, Interix 3.5, BeOS. "
>>
>>                           Pity.
>>
>>
>>                                    Otherwise what you have seems okay -
>>         though I do
>>                               dislike having to
>>                                    duplicate all that details already
>>         buried in
>>                  the system
>>                                    headers/library. Not sure we need to
>>         the long
>>                  text at
>>                               the VM level -
>>                                    which would simplify things a bit.
>>
>>
>>                               I agree, I dislike this too. Like everyone
>>         else in this
>>                               thread. But I
>>                               think this is a pragmatic solution.
>>
>>                               I am a bit stuck here - should we really
>>         get rid of
>>                  the long
>>                               text
>>                               feature? There are some callsites of
>>         strerror() in the
>>                               hotspot where
>>                               arguably the long text is better suited:
>>                               - in assert_status() (see debug.hpp) -
>>         this ends up
>>                  in the
>>                               header of
>>                               error files, if this suddenly changes to a
>>         literalized
>>                               errno, people may
>>                               be upset
>>
>>
>>                           I added assert_status purely to expand on what
>>         the failing
>>                           status was :) I'm happy to see EINVAL in that
>>         context
>>                  rather
>>                           than "invalid argument" (or whatever) :)
>>
>>                               - when failing to write a heap dump 
>> file - see
>>                               services/heapDumper.cpp.
>>                               Which ends as printout on the command
>>         line, I think.
>>
>>
>>                           Maybe ... if it is an error the user can fix.
>>
>>                               The safe option would be to provide both
>>         variants
>>                  (short and
>>                               long text).
>>                               Or, provide the safe, short variant for
>>         all logging
>>                  calls - when
>>                               "EINVAL" is enough, and let users continue
>>         to use
>>                  strerror()
>>                               for those
>>                               few cases where the long text is needed.
>>
>>                               What do you think?
>>
>>
>>                           Trying to decide what makes me least unhappy.
>>         :) Given
>>                  you've
>>                           already done the work I guess having long and
>>         short
>>                  forms is okay.
>>
>>                           Thanks,
>>                           David
>>
>>                               Thanks, Thomas
>>
>>                                    Thanks,
>>                                    David
>>
>>
>>                                        Please note that I did not yet
>>         change any call
>>                               sites, although
>>                                        all call
>>                                        sites in the os namespace should
>>         already
>>                  use the
>>                               new function. I
>>                                        wanted to
>>                                        see whether there would be any
>>         general
>>                  objections.
>>
>>                                        Kind Regards, Thomas
>>
>>
>>
>>
>>
>>



More information about the hotspot-runtime-dev mailing list