RFR(s): 8148425: strerror() function is not thread-safe

David Holmes david.holmes at oracle.com
Mon Mar 14 03:43:04 UTC 2016


On 12/03/2016 6:22 PM, Thomas Stüfe wrote:
> Hi David,
>
> thanks again for your review. Here the new version:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.02/webrev/
>
> And the diff to the last version:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01-to-02.txt
>
> I fixed the indention, copyright comments and the typo.

All looks good - thanks.

> Further comments inline.

Ditto :)

>
> On Fri, Mar 11, 2016 at 8:04 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     Thanks for persevering with this. Sorry for the delayed response.
>
>     On 10/03/2016 2:58 AM, Thomas Stüfe wrote:
>
>         Hi all,
>
>         please take a look at this next version of the fix.
>
>         http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01/webrev/
>
>         I am the first to admit that this is quite ugly :(, but it is a
>         pragmatic solution.
>
>         Some details:
>
>         - I added os::strerror() as a drop-in replacement for
>         ::strerror() - it
>         returns the text as before, but does not localize and does not
>         handle
>         unknown error values; so it can be implemented using a simple static
>         string table and is always safe to use.
>
>
>     Ok. But I don't think we need os::lasterror() if we have this. A
>     follow up clean-up is okay for that though.
>
>
> I would love to keep this functionality, because I like printing out the
> short concise errno values, and there are a number of logging changes
> coming at least for AIX. It also does not cost much, as the large string
> table is already there.
>
> If the second API does not suit you, would it be accepable for you if I
> hide this functionality behind a second parameter in strerror,
> defaulting to false, like I did in my first version? E.g.
> os::strerror(int e, bool short_name=false); ?

That was what I was thinking of yes. :)

> But talking about clean-up, could we get rid of os::lasterror() ? I
> think that function is a bad idea, and it fortunately is only called in
> two places, both just need strerror() and no Win32 error handling anyway.

Sounds good. lasterror() is a fragile API as it is too easy to introduce 
unexpected system calls between the call to lasterror and the action you 
want to check for the error.

> I also would like to remove os::get_last_error(). It returns errno on
> *nix, win32 error on Windows. Both are different things, plus there is
> also an errno on Windows The return value of this function is useless
> without knowing what the returned error number actually is, so it is not
> of much use.

Simplifying and streamlining all this sounds good.

>     Also it does seem to handle unknown error values.
>
>         - I added os::errno_name() for those cases where one would want the
>         literalized errno value (e.g. "EINVAL") instead of the full text.
>
>
>     Ok. For the unknown case can it return the numeric value that it was
>     passed?
>
>
> I would like to avoid this. This would make the function more complicated.
>
> I'd need to keep this thread-safe, so I either would have to allocate
> thread local storage for the return string somehow or, alternatively,
> keep a fixed string table a la [][] {"unknown (0)", "unknown(1)"} for
> the first n values or so.

Okay. I was thinking this would be simpler but of course we have to 
convert to a string.

I note that linux defines ~133 errno values, and you have entries for 
~75 - are they all the POSIX ones?

I'm wondering whether we should have a fallback strategy for an unknown 
(platform-specific) error by using the platform strerror even though not 
thread-safe (the window of exposure is vastly reduced) ?

> Also, ::strerror() behaves not consistently for the EINVAL case on all
> platforms anyway - on Linux and Bsd, it returns the numerical value, but
> on Solaris, Windows and AIX, it just returns "Unknown". Many callsites I
> saw take this already into account and print out the numerical errno
> value in addition to the error string.

I doubt those callsites are not taking into account unknown values - I 
think you will find they originally only reported the numerical value 
and were then augmented to also print the strerror message.

Thanks,
David

>         - I fixed all callsites in the hotspot which used ::strerror to use
>         os::strerror, or, in some few cases (mostly my own coding for AIX),
>         os::errno_name().
>
>
>     Bit subjective but okay - I personally don't care either way. :)
>
>         - To avoid including os.hpp in debug.hpp, I moved the
>         implemntation of
>         assert_status to debug.cpp
>
>
>     Okay ... did the include cause a problem?
>
>
> os.hpp is a large beast. I wanted to keep it out of debug.hpp, to not
> force this include upon anyone who includes debug.hpp just to get
> assert(). I also try to avoid calling system apis in header files.
>
>
>
>     General minor comments:
>
>     src/os/posix/vm/os_posix.cpp
>
>     Indentation of wrapped call needs fixing
>
>
> Fixed
>
>     ---
>
>     src/share/vm/runtime/os.hpp
>
>     Typo:
>
>     +   //  sugested in the POSIX standard.
>
>     -> suggested
>
>
> Fixed
>
>     ---
>
>     src/share/vm/utilities/vmError.cpp
>
>     Good catch on the code that potentially reports the wrong error
>     using get_last_error! Even the simple print_cr(..., strerror(errno),
>     errno) could in fact print non-matching values if strerror itself
>     caused an error (unlikely of course :) ).
>
>
> Thanks :)
>
>
>     Please check all copyright years - attachListener_solaris.cpp was
>     updated to 2015 instead of 2016 :) Also note correct format for two
>     year copyrights - in src/share/vm/logging/logFileOutput.cpp you have:
>
>     Copyright (c) 2015,2016 Oracle
>
>     but it should be:
>
>     Copyright (c) 2015, 2016, Oracle
>
> Fixed
>
>     Thanks,
>     David
>
>
> Thanks!
>
> Thomas
>
>
>         Kind Regards, Thomas
>
>
>
>
>         On Tue, Mar 8, 2016 at 11:01 PM, Coleen Phillimore
>         <coleen.phillimore at oracle.com
>         <mailto:coleen.phillimore at oracle.com>
>         <mailto:coleen.phillimore at oracle.com
>         <mailto:coleen.phillimore at oracle.com>>> wrote:
>
>
>              This latest version seems okay (or at least better than the
>         other
>              workarounds).
>
>              thanks,
>              Coleen
>
>
>              On 3/1/16 12:30 AM, David Holmes wrote:
>
>                  Hi Thomas,
>
>                  On 1/03/2016 3:23 AM, Thomas Stüfe wrote:
>
>                      Hi David,
>
>                      On Mon, Feb 29, 2016 at 4:44 AM, David Holmes
>                      <david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com> <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>
>                      <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>
>
>                      <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>>> wrote:
>
>                           Hi Thomas,
>
>                           On 27/02/2016 2:05 AM, Thomas Stüfe wrote:
>
>                               Hi,
>
>                               please take a look at this proposed fix:
>
>                               Bug:
>         https://bugs.openjdk.java.net/browse/JDK-8148425
>                               Webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.00/webrev/
>
>                               This adds a replacement function
>         os::strerror() as
>                      a drop-in for
>                               strerror(), which has a number of issues.
>
>                               strerror() is unsafe to use and may cause
>         (and has
>                      caused)
>                               crashes in
>                               multithreaded scenarios. It is also not
>         ideal for
>                      log files
>                               because of the
>                               implicit localization of the error messages.
>
>                               For details please see the discussion
>         under the bug
>                      report.
>
>
>                           I just came across strerror_l, which is
>         required to be
>                      thread-safe.
>                           Is that a possible alternative? (I'm not sure how
>                      locale's are
>                           obtained).
>
>
>                      Sorry, I think this API is glibc only. At least I
>         cannot
>                      find this in
>                      our AIX headers, nor on Solaris.
>
>
>                  It is a POSIX API:
>
>         http://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror_l.html
>
>                  added in 2006 but part of POSIX.1-2008. But as per
>         Dalibor's
>                  link not necessarily available:
>
>                  "This function is missing on some platforms: glibc
>         2.3.6, Mac OS
>                  X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8, Minix
>         3.1.8, AIX
>                  5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11 2011-11,
>         Cygwin,
>                  mingw, MSVC 9, Interix 3.5, BeOS. "
>
>                  Pity.
>
>
>                           Otherwise what you have seems okay - though I do
>                      dislike having to
>                           duplicate all that details already buried in
>         the system
>                           headers/library. Not sure we need to the long
>         text at
>                      the VM level -
>                           which would simplify things a bit.
>
>
>                      I agree, I dislike this too. Like everyone else in this
>                      thread. But I
>                      think this is a pragmatic solution.
>
>                      I am a bit stuck here - should we really get rid of
>         the long
>                      text
>                      feature? There are some callsites of strerror() in the
>                      hotspot where
>                      arguably the long text is better suited:
>                      - in assert_status() (see debug.hpp) - this ends up
>         in the
>                      header of
>                      error files, if this suddenly changes to a literalized
>                      errno, people may
>                      be upset
>
>
>                  I added assert_status purely to expand on what the failing
>                  status was :) I'm happy to see EINVAL in that context
>         rather
>                  than "invalid argument" (or whatever) :)
>
>                      - when failing to write a heap dump file - see
>                      services/heapDumper.cpp.
>                      Which ends as printout on the command line, I think.
>
>
>                  Maybe ... if it is an error the user can fix.
>
>                      The safe option would be to provide both variants
>         (short and
>                      long text).
>                      Or, provide the safe, short variant for all logging
>         calls - when
>                      "EINVAL" is enough, and let users continue to use
>         strerror()
>                      for those
>                      few cases where the long text is needed.
>
>                      What do you think?
>
>
>                  Trying to decide what makes me least unhappy. :) Given
>         you've
>                  already done the work I guess having long and short
>         forms is okay.
>
>                  Thanks,
>                  David
>
>                      Thanks, Thomas
>
>                           Thanks,
>                           David
>
>
>                               Please note that I did not yet change any call
>                      sites, although
>                               all call
>                               sites in the os namespace should already
>         use the
>                      new function. I
>                               wanted to
>                               see whether there would be any general
>         objections.
>
>                               Kind Regards, Thomas
>
>
>
>
>


More information about the hotspot-runtime-dev mailing list