RFR(s): 8148425: strerror() function is not thread-safe

Thomas Stüfe thomas.stuefe at gmail.com
Tue Mar 15 16:02:31 UTC 2016


Coleen, David,

thank you both!

I will do a follow up change later when there is more time.

Kind regards, Thomas



On Tue, Mar 15, 2016 at 4:02 PM, Coleen Phillimore <
coleen.phillimore at oracle.com> wrote:

>
> This looks fine to me, as-is.
>
> On 3/15/16 8:31 AM, David Holmes wrote:
>
>> Hi Thomas,
>>
>> On 15/03/2016 8:07 PM, Thomas Stüfe wrote:
>>
>>> Hi David,
>>>
>>> sorry, I will have not the time to work on this this week. If the patch
>>> is acceptable to you in this form, I'd say we commit it, otherwise I'll
>>> take it up again some time later.
>>>
>>
>> If Coleen is okay with this as-is I'll commit it in the  morning.
>>
>> About the suggested changes:
>>>
>>> - Merging both APIs into one, like suggested, is fine for me and I would
>>> do this in a follow-up.
>>>
>>
>> Sure.
>>
>> - Handling unknown errno values: I do not like the idea of a fallback to
>>> ::strerror(). Writing the API comment on that one would be weird: "Is
>>> almost threadsafe", "will probably return a static constant string"...
>>> :) I would suggest either using the solution as it is now - only
>>> honouring POSIX errno values or returning "Unknown", or to expand them
>>> with OS specific lists, either in the OS independent form ("#ifdef EBLUB
>>> handle(EBLUB) #endif") or by expanding os::strerror() with platform
>>> dependend os::strerror_pd().
>>>
>>> I am pretty unemotional about either option but would prefer avoiding
>>> ::strerror() altogether.
>>>
>>
>> A follow up on this okay. I'm unsure how best to handle it. Given how
>> rarely we encounter errors I'm probably over-thinking it. ;)
>>
>>
> Yeah, I think you're over-thinking it.  The two APIs with comments make
> sense to me, and something returning Unknown isn't a big deal in real life.
>
> Thanks for the change and thanks for sponsoring this, David.
>
> Coleen
>
>
> Thanks,
>> David
>>
>> Kind Regards, Thomas
>>>
>>>
>>>
>>> On Mon, Mar 14, 2016 at 4:43 AM, David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>> wrote:
>>>
>>>     On 12/03/2016 6:22 PM, Thomas Stüfe wrote:
>>>
>>>         Hi David,
>>>
>>>         thanks again for your review. Here the new version:
>>>
>>>
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.02/webrev/
>>>
>>>         And the diff to the last version:
>>>
>>>
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01-to-02.txt
>>>
>>>         I fixed the indention, copyright comments and the typo.
>>>
>>>
>>>     All looks good - thanks.
>>>
>>>         Further comments inline.
>>>
>>>
>>>     Ditto :)
>>>
>>>
>>>         On Fri, Mar 11, 2016 at 8:04 AM, David Holmes
>>>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>>         <mailto:david.holmes at oracle.com
>>>         <mailto:david.holmes at oracle.com>>> wrote:
>>>
>>>              Hi Thomas,
>>>
>>>              Thanks for persevering with this. Sorry for the delayed
>>>         response.
>>>
>>>              On 10/03/2016 2:58 AM, Thomas Stüfe wrote:
>>>
>>>                  Hi all,
>>>
>>>                  please take a look at this next version of the fix.
>>>
>>>
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01/webrev/
>>>
>>>                  I am the first to admit that this is quite ugly :(, but
>>>         it is a
>>>                  pragmatic solution.
>>>
>>>                  Some details:
>>>
>>>                  - I added os::strerror() as a drop-in replacement for
>>>                  ::strerror() - it
>>>                  returns the text as before, but does not localize and
>>>         does not
>>>                  handle
>>>                  unknown error values; so it can be implemented using a
>>>         simple static
>>>                  string table and is always safe to use.
>>>
>>>
>>>              Ok. But I don't think we need os::lasterror() if we have
>>>         this. A
>>>              follow up clean-up is okay for that though.
>>>
>>>
>>>         I would love to keep this functionality, because I like printing
>>>         out the
>>>         short concise errno values, and there are a number of logging
>>>         changes
>>>         coming at least for AIX. It also does not cost much, as the
>>>         large string
>>>         table is already there.
>>>
>>>         If the second API does not suit you, would it be accepable for
>>>         you if I
>>>         hide this functionality behind a second parameter in strerror,
>>>         defaulting to false, like I did in my first version? E.g.
>>>         os::strerror(int e, bool short_name=false); ?
>>>
>>>
>>>     That was what I was thinking of yes. :)
>>>
>>>         But talking about clean-up, could we get rid of os::lasterror()
>>> ? I
>>>         think that function is a bad idea, and it fortunately is only
>>>         called in
>>>         two places, both just need strerror() and no Win32 error
>>>         handling anyway.
>>>
>>>
>>>     Sounds good. lasterror() is a fragile API as it is too easy to
>>>     introduce unexpected system calls between the call to lasterror and
>>>     the action you want to check for the error.
>>>
>>>         I also would like to remove os::get_last_error(). It returns
>>>         errno on
>>>         *nix, win32 error on Windows. Both are different things, plus
>>>         there is
>>>         also an errno on Windows The return value of this function is
>>>         useless
>>>         without knowing what the returned error number actually is, so
>>>         it is not
>>>         of much use.
>>>
>>>
>>>     Simplifying and streamlining all this sounds good.
>>>
>>>              Also it does seem to handle unknown error values.
>>>
>>>                  - I added os::errno_name() for those cases where one
>>>         would want the
>>>                  literalized errno value (e.g. "EINVAL") instead of the
>>>         full text.
>>>
>>>
>>>              Ok. For the unknown case can it return the numeric value
>>>         that it was
>>>              passed?
>>>
>>>
>>>         I would like to avoid this. This would make the function more
>>>         complicated.
>>>
>>>         I'd need to keep this thread-safe, so I either would have to
>>>         allocate
>>>         thread local storage for the return string somehow or,
>>>         alternatively,
>>>         keep a fixed string table a la [][] {"unknown (0)",
>>>         "unknown(1)"} for
>>>         the first n values or so.
>>>
>>>
>>>     Okay. I was thinking this would be simpler but of course we have to
>>>     convert to a string.
>>>
>>>     I note that linux defines ~133 errno values, and you have entries
>>>     for ~75 - are they all the POSIX ones?
>>>
>>>     I'm wondering whether we should have a fallback strategy for an
>>>     unknown (platform-specific) error by using the platform strerror
>>>     even though not thread-safe (the window of exposure is vastly
>>> reduced) ?
>>>
>>>         Also, ::strerror() behaves not consistently for the EINVAL case
>>>         on all
>>>         platforms anyway - on Linux and Bsd, it returns the numerical
>>>         value, but
>>>         on Solaris, Windows and AIX, it just returns "Unknown". Many
>>>         callsites I
>>>         saw take this already into account and print out the numerical
>>> errno
>>>         value in addition to the error string.
>>>
>>>
>>>     I doubt those callsites are not taking into account unknown values -
>>>     I think you will find they originally only reported the numerical
>>>     value and were then augmented to also print the strerror message.
>>>
>>>     Thanks,
>>>     David
>>>
>>>                  - I fixed all callsites in the hotspot which used
>>>         ::strerror to use
>>>                  os::strerror, or, in some few cases (mostly my own
>>>         coding for AIX),
>>>                  os::errno_name().
>>>
>>>
>>>              Bit subjective but okay - I personally don't care either
>>>         way. :)
>>>
>>>                  - To avoid including os.hpp in debug.hpp, I moved the
>>>                  implemntation of
>>>                  assert_status to debug.cpp
>>>
>>>
>>>              Okay ... did the include cause a problem?
>>>
>>>
>>>         os.hpp is a large beast. I wanted to keep it out of debug.hpp,
>>>         to not
>>>         force this include upon anyone who includes debug.hpp just to get
>>>         assert(). I also try to avoid calling system apis in header
>>> files.
>>>
>>>
>>>
>>>              General minor comments:
>>>
>>>              src/os/posix/vm/os_posix.cpp
>>>
>>>              Indentation of wrapped call needs fixing
>>>
>>>
>>>         Fixed
>>>
>>>              ---
>>>
>>>              src/share/vm/runtime/os.hpp
>>>
>>>              Typo:
>>>
>>>              +   //  sugested in the POSIX standard.
>>>
>>>              -> suggested
>>>
>>>
>>>         Fixed
>>>
>>>              ---
>>>
>>>              src/share/vm/utilities/vmError.cpp
>>>
>>>              Good catch on the code that potentially reports the wrong
>>> error
>>>              using get_last_error! Even the simple print_cr(...,
>>>         strerror(errno),
>>>              errno) could in fact print non-matching values if strerror
>>>         itself
>>>              caused an error (unlikely of course :) ).
>>>
>>>
>>>         Thanks :)
>>>
>>>
>>>              Please check all copyright years -
>>>         attachListener_solaris.cpp was
>>>              updated to 2015 instead of 2016 :) Also note correct format
>>>         for two
>>>              year copyrights - in src/share/vm/logging/logFileOutput.cpp
>>>         you have:
>>>
>>>              Copyright (c) 2015,2016 Oracle
>>>
>>>              but it should be:
>>>
>>>              Copyright (c) 2015, 2016, Oracle
>>>
>>>         Fixed
>>>
>>>              Thanks,
>>>              David
>>>
>>>
>>>         Thanks!
>>>
>>>         Thomas
>>>
>>>
>>>                  Kind Regards, Thomas
>>>
>>>
>>>
>>>
>>>                  On Tue, Mar 8, 2016 at 11:01 PM, Coleen Phillimore
>>>                  <coleen.phillimore at oracle.com
>>>         <mailto:coleen.phillimore at oracle.com>
>>>                  <mailto:coleen.phillimore at oracle.com
>>>         <mailto:coleen.phillimore at oracle.com>>
>>>                  <mailto:coleen.phillimore at oracle.com
>>>         <mailto:coleen.phillimore at oracle.com>
>>>
>>>                  <mailto:coleen.phillimore at oracle.com
>>>         <mailto:coleen.phillimore at oracle.com>>>> wrote:
>>>
>>>
>>>                       This latest version seems okay (or at least better
>>>         than the
>>>                  other
>>>                       workarounds).
>>>
>>>                       thanks,
>>>                       Coleen
>>>
>>>
>>>                       On 3/1/16 12:30 AM, David Holmes wrote:
>>>
>>>                           Hi Thomas,
>>>
>>>                           On 1/03/2016 3:23 AM, Thomas Stüfe wrote:
>>>
>>>                               Hi David,
>>>
>>>                               On Mon, Feb 29, 2016 at 4:44 AM, David
>>> Holmes
>>>                               <david.holmes at oracle.com
>>>         <mailto:david.holmes at oracle.com>
>>>                  <mailto:david.holmes at oracle.com
>>>         <mailto:david.holmes at oracle.com>>
>>>         <mailto:david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>>                  <mailto:david.holmes at oracle.com
>>>         <mailto:david.holmes at oracle.com>>>
>>>                               <mailto:david.holmes at oracle.com
>>>         <mailto:david.holmes at oracle.com>
>>>                  <mailto:david.holmes at oracle.com
>>>         <mailto:david.holmes at oracle.com>>
>>>
>>>                               <mailto:david.holmes at oracle.com
>>>         <mailto:david.holmes at oracle.com>
>>>                  <mailto:david.holmes at oracle.com
>>>         <mailto:david.holmes at oracle.com>>>>> wrote:
>>>
>>>                                    Hi Thomas,
>>>
>>>                                    On 27/02/2016 2:05 AM, Thomas Stüfe
>>>         wrote:
>>>
>>>                                        Hi,
>>>
>>>                                        please take a look at this
>>>         proposed fix:
>>>
>>>                                        Bug:
>>>         https://bugs.openjdk.java.net/browse/JDK-8148425
>>>                                        Webrev:
>>>
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.00/webrev/
>>>
>>>                                        This adds a replacement function
>>>                  os::strerror() as
>>>                               a drop-in for
>>>                                        strerror(), which has a number of
>>>         issues.
>>>
>>>                                        strerror() is unsafe to use and
>>>         may cause
>>>                  (and has
>>>                               caused)
>>>                                        crashes in
>>>                                        multithreaded scenarios. It is
>>>         also not
>>>                  ideal for
>>>                               log files
>>>                                        because of the
>>>                                        implicit localization of the
>>>         error messages.
>>>
>>>                                        For details please see the
>>> discussion
>>>                  under the bug
>>>                               report.
>>>
>>>
>>>                                    I just came across strerror_l, which
>>> is
>>>                  required to be
>>>                               thread-safe.
>>>                                    Is that a possible alternative? (I'm
>>>         not sure how
>>>                               locale's are
>>>                                    obtained).
>>>
>>>
>>>                               Sorry, I think this API is glibc only. At
>>>         least I
>>>                  cannot
>>>                               find this in
>>>                               our AIX headers, nor on Solaris.
>>>
>>>
>>>                           It is a POSIX API:
>>>
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror_l.html
>>>
>>>                           added in 2006 but part of POSIX.1-2008. But as
>>> per
>>>                  Dalibor's
>>>                           link not necessarily available:
>>>
>>>                           "This function is missing on some platforms:
>>> glibc
>>>                  2.3.6, Mac OS
>>>                           X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8,
>>>         Minix
>>>                  3.1.8, AIX
>>>                           5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11
>>>         2011-11,
>>>                  Cygwin,
>>>                           mingw, MSVC 9, Interix 3.5, BeOS. "
>>>
>>>                           Pity.
>>>
>>>
>>>                                    Otherwise what you have seems okay -
>>>         though I do
>>>                               dislike having to
>>>                                    duplicate all that details already
>>>         buried in
>>>                  the system
>>>                                    headers/library. Not sure we need to
>>>         the long
>>>                  text at
>>>                               the VM level -
>>>                                    which would simplify things a bit.
>>>
>>>
>>>                               I agree, I dislike this too. Like everyone
>>>         else in this
>>>                               thread. But I
>>>                               think this is a pragmatic solution.
>>>
>>>                               I am a bit stuck here - should we really
>>>         get rid of
>>>                  the long
>>>                               text
>>>                               feature? There are some callsites of
>>>         strerror() in the
>>>                               hotspot where
>>>                               arguably the long text is better suited:
>>>                               - in assert_status() (see debug.hpp) -
>>>         this ends up
>>>                  in the
>>>                               header of
>>>                               error files, if this suddenly changes to a
>>>         literalized
>>>                               errno, people may
>>>                               be upset
>>>
>>>
>>>                           I added assert_status purely to expand on what
>>>         the failing
>>>                           status was :) I'm happy to see EINVAL in that
>>>         context
>>>                  rather
>>>                           than "invalid argument" (or whatever) :)
>>>
>>>                               - when failing to write a heap dump file -
>>> see
>>>                               services/heapDumper.cpp.
>>>                               Which ends as printout on the command
>>>         line, I think.
>>>
>>>
>>>                           Maybe ... if it is an error the user can fix.
>>>
>>>                               The safe option would be to provide both
>>>         variants
>>>                  (short and
>>>                               long text).
>>>                               Or, provide the safe, short variant for
>>>         all logging
>>>                  calls - when
>>>                               "EINVAL" is enough, and let users continue
>>>         to use
>>>                  strerror()
>>>                               for those
>>>                               few cases where the long text is needed.
>>>
>>>                               What do you think?
>>>
>>>
>>>                           Trying to decide what makes me least unhappy.
>>>         :) Given
>>>                  you've
>>>                           already done the work I guess having long and
>>>         short
>>>                  forms is okay.
>>>
>>>                           Thanks,
>>>                           David
>>>
>>>                               Thanks, Thomas
>>>
>>>                                    Thanks,
>>>                                    David
>>>
>>>
>>>                                        Please note that I did not yet
>>>         change any call
>>>                               sites, although
>>>                                        all call
>>>                                        sites in the os namespace should
>>>         already
>>>                  use the
>>>                               new function. I
>>>                                        wanted to
>>>                                        see whether there would be any
>>>         general
>>>                  objections.
>>>
>>>                                        Kind Regards, Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>


More information about the hotspot-runtime-dev mailing list