RFR(s): 8148425: strerror() function is not thread-safe
David Holmes
david.holmes at oracle.com
Tue Mar 15 12:31:23 UTC 2016
Hi Thomas,
On 15/03/2016 8:07 PM, Thomas Stüfe wrote:
> Hi David,
>
> sorry, I will have not the time to work on this this week. If the patch
> is acceptable to you in this form, I'd say we commit it, otherwise I'll
> take it up again some time later.
If Coleen is okay with this as-is I'll commit it in the morning.
> About the suggested changes:
>
> - Merging both APIs into one, like suggested, is fine for me and I would
> do this in a follow-up.
Sure.
> - Handling unknown errno values: I do not like the idea of a fallback to
> ::strerror(). Writing the API comment on that one would be weird: "Is
> almost threadsafe", "will probably return a static constant string"...
> :) I would suggest either using the solution as it is now - only
> honouring POSIX errno values or returning "Unknown", or to expand them
> with OS specific lists, either in the OS independent form ("#ifdef EBLUB
> handle(EBLUB) #endif") or by expanding os::strerror() with platform
> dependend os::strerror_pd().
>
> I am pretty unemotional about either option but would prefer avoiding
> ::strerror() altogether.
A follow up on this okay. I'm unsure how best to handle it. Given how
rarely we encounter errors I'm probably over-thinking it. ;)
Thanks,
David
> Kind Regards, Thomas
>
>
>
> On Mon, Mar 14, 2016 at 4:43 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> On 12/03/2016 6:22 PM, Thomas Stüfe wrote:
>
> Hi David,
>
> thanks again for your review. Here the new version:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.02/webrev/
>
> And the diff to the last version:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01-to-02.txt
>
> I fixed the indention, copyright comments and the typo.
>
>
> All looks good - thanks.
>
> Further comments inline.
>
>
> Ditto :)
>
>
> On Fri, Mar 11, 2016 at 8:04 AM, David Holmes
> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>> wrote:
>
> Hi Thomas,
>
> Thanks for persevering with this. Sorry for the delayed
> response.
>
> On 10/03/2016 2:58 AM, Thomas Stüfe wrote:
>
> Hi all,
>
> please take a look at this next version of the fix.
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01/webrev/
>
> I am the first to admit that this is quite ugly :(, but
> it is a
> pragmatic solution.
>
> Some details:
>
> - I added os::strerror() as a drop-in replacement for
> ::strerror() - it
> returns the text as before, but does not localize and
> does not
> handle
> unknown error values; so it can be implemented using a
> simple static
> string table and is always safe to use.
>
>
> Ok. But I don't think we need os::lasterror() if we have
> this. A
> follow up clean-up is okay for that though.
>
>
> I would love to keep this functionality, because I like printing
> out the
> short concise errno values, and there are a number of logging
> changes
> coming at least for AIX. It also does not cost much, as the
> large string
> table is already there.
>
> If the second API does not suit you, would it be accepable for
> you if I
> hide this functionality behind a second parameter in strerror,
> defaulting to false, like I did in my first version? E.g.
> os::strerror(int e, bool short_name=false); ?
>
>
> That was what I was thinking of yes. :)
>
> But talking about clean-up, could we get rid of os::lasterror() ? I
> think that function is a bad idea, and it fortunately is only
> called in
> two places, both just need strerror() and no Win32 error
> handling anyway.
>
>
> Sounds good. lasterror() is a fragile API as it is too easy to
> introduce unexpected system calls between the call to lasterror and
> the action you want to check for the error.
>
> I also would like to remove os::get_last_error(). It returns
> errno on
> *nix, win32 error on Windows. Both are different things, plus
> there is
> also an errno on Windows The return value of this function is
> useless
> without knowing what the returned error number actually is, so
> it is not
> of much use.
>
>
> Simplifying and streamlining all this sounds good.
>
> Also it does seem to handle unknown error values.
>
> - I added os::errno_name() for those cases where one
> would want the
> literalized errno value (e.g. "EINVAL") instead of the
> full text.
>
>
> Ok. For the unknown case can it return the numeric value
> that it was
> passed?
>
>
> I would like to avoid this. This would make the function more
> complicated.
>
> I'd need to keep this thread-safe, so I either would have to
> allocate
> thread local storage for the return string somehow or,
> alternatively,
> keep a fixed string table a la [][] {"unknown (0)",
> "unknown(1)"} for
> the first n values or so.
>
>
> Okay. I was thinking this would be simpler but of course we have to
> convert to a string.
>
> I note that linux defines ~133 errno values, and you have entries
> for ~75 - are they all the POSIX ones?
>
> I'm wondering whether we should have a fallback strategy for an
> unknown (platform-specific) error by using the platform strerror
> even though not thread-safe (the window of exposure is vastly reduced) ?
>
> Also, ::strerror() behaves not consistently for the EINVAL case
> on all
> platforms anyway - on Linux and Bsd, it returns the numerical
> value, but
> on Solaris, Windows and AIX, it just returns "Unknown". Many
> callsites I
> saw take this already into account and print out the numerical errno
> value in addition to the error string.
>
>
> I doubt those callsites are not taking into account unknown values -
> I think you will find they originally only reported the numerical
> value and were then augmented to also print the strerror message.
>
> Thanks,
> David
>
> - I fixed all callsites in the hotspot which used
> ::strerror to use
> os::strerror, or, in some few cases (mostly my own
> coding for AIX),
> os::errno_name().
>
>
> Bit subjective but okay - I personally don't care either
> way. :)
>
> - To avoid including os.hpp in debug.hpp, I moved the
> implemntation of
> assert_status to debug.cpp
>
>
> Okay ... did the include cause a problem?
>
>
> os.hpp is a large beast. I wanted to keep it out of debug.hpp,
> to not
> force this include upon anyone who includes debug.hpp just to get
> assert(). I also try to avoid calling system apis in header files.
>
>
>
> General minor comments:
>
> src/os/posix/vm/os_posix.cpp
>
> Indentation of wrapped call needs fixing
>
>
> Fixed
>
> ---
>
> src/share/vm/runtime/os.hpp
>
> Typo:
>
> + // sugested in the POSIX standard.
>
> -> suggested
>
>
> Fixed
>
> ---
>
> src/share/vm/utilities/vmError.cpp
>
> Good catch on the code that potentially reports the wrong error
> using get_last_error! Even the simple print_cr(...,
> strerror(errno),
> errno) could in fact print non-matching values if strerror
> itself
> caused an error (unlikely of course :) ).
>
>
> Thanks :)
>
>
> Please check all copyright years -
> attachListener_solaris.cpp was
> updated to 2015 instead of 2016 :) Also note correct format
> for two
> year copyrights - in src/share/vm/logging/logFileOutput.cpp
> you have:
>
> Copyright (c) 2015,2016 Oracle
>
> but it should be:
>
> Copyright (c) 2015, 2016, Oracle
>
> Fixed
>
> Thanks,
> David
>
>
> Thanks!
>
> Thomas
>
>
> Kind Regards, Thomas
>
>
>
>
> On Tue, Mar 8, 2016 at 11:01 PM, Coleen Phillimore
> <coleen.phillimore at oracle.com
> <mailto:coleen.phillimore at oracle.com>
> <mailto:coleen.phillimore at oracle.com
> <mailto:coleen.phillimore at oracle.com>>
> <mailto:coleen.phillimore at oracle.com
> <mailto:coleen.phillimore at oracle.com>
>
> <mailto:coleen.phillimore at oracle.com
> <mailto:coleen.phillimore at oracle.com>>>> wrote:
>
>
> This latest version seems okay (or at least better
> than the
> other
> workarounds).
>
> thanks,
> Coleen
>
>
> On 3/1/16 12:30 AM, David Holmes wrote:
>
> Hi Thomas,
>
> On 1/03/2016 3:23 AM, Thomas Stüfe wrote:
>
> Hi David,
>
> On Mon, Feb 29, 2016 at 4:44 AM, David Holmes
> <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> <mailto:david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>>>> wrote:
>
> Hi Thomas,
>
> On 27/02/2016 2:05 AM, Thomas Stüfe
> wrote:
>
> Hi,
>
> please take a look at this
> proposed fix:
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8148425
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.00/webrev/
>
> This adds a replacement function
> os::strerror() as
> a drop-in for
> strerror(), which has a number of
> issues.
>
> strerror() is unsafe to use and
> may cause
> (and has
> caused)
> crashes in
> multithreaded scenarios. It is
> also not
> ideal for
> log files
> because of the
> implicit localization of the
> error messages.
>
> For details please see the discussion
> under the bug
> report.
>
>
> I just came across strerror_l, which is
> required to be
> thread-safe.
> Is that a possible alternative? (I'm
> not sure how
> locale's are
> obtained).
>
>
> Sorry, I think this API is glibc only. At
> least I
> cannot
> find this in
> our AIX headers, nor on Solaris.
>
>
> It is a POSIX API:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror_l.html
>
> added in 2006 but part of POSIX.1-2008. But as per
> Dalibor's
> link not necessarily available:
>
> "This function is missing on some platforms: glibc
> 2.3.6, Mac OS
> X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8,
> Minix
> 3.1.8, AIX
> 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11
> 2011-11,
> Cygwin,
> mingw, MSVC 9, Interix 3.5, BeOS. "
>
> Pity.
>
>
> Otherwise what you have seems okay -
> though I do
> dislike having to
> duplicate all that details already
> buried in
> the system
> headers/library. Not sure we need to
> the long
> text at
> the VM level -
> which would simplify things a bit.
>
>
> I agree, I dislike this too. Like everyone
> else in this
> thread. But I
> think this is a pragmatic solution.
>
> I am a bit stuck here - should we really
> get rid of
> the long
> text
> feature? There are some callsites of
> strerror() in the
> hotspot where
> arguably the long text is better suited:
> - in assert_status() (see debug.hpp) -
> this ends up
> in the
> header of
> error files, if this suddenly changes to a
> literalized
> errno, people may
> be upset
>
>
> I added assert_status purely to expand on what
> the failing
> status was :) I'm happy to see EINVAL in that
> context
> rather
> than "invalid argument" (or whatever) :)
>
> - when failing to write a heap dump file - see
> services/heapDumper.cpp.
> Which ends as printout on the command
> line, I think.
>
>
> Maybe ... if it is an error the user can fix.
>
> The safe option would be to provide both
> variants
> (short and
> long text).
> Or, provide the safe, short variant for
> all logging
> calls - when
> "EINVAL" is enough, and let users continue
> to use
> strerror()
> for those
> few cases where the long text is needed.
>
> What do you think?
>
>
> Trying to decide what makes me least unhappy.
> :) Given
> you've
> already done the work I guess having long and
> short
> forms is okay.
>
> Thanks,
> David
>
> Thanks, Thomas
>
> Thanks,
> David
>
>
> Please note that I did not yet
> change any call
> sites, although
> all call
> sites in the os namespace should
> already
> use the
> new function. I
> wanted to
> see whether there would be any
> general
> objections.
>
> Kind Regards, Thomas
>
>
>
>
>
>
More information about the hotspot-runtime-dev
mailing list