RFR(s): 8148425: strerror() function is not thread-safe
Coleen Phillimore
coleen.phillimore at oracle.com
Tue Mar 15 15:02:37 UTC 2016
This looks fine to me, as-is.
On 3/15/16 8:31 AM, David Holmes wrote:
> Hi Thomas,
>
> On 15/03/2016 8:07 PM, Thomas Stüfe wrote:
>> Hi David,
>>
>> sorry, I will have not the time to work on this this week. If the patch
>> is acceptable to you in this form, I'd say we commit it, otherwise I'll
>> take it up again some time later.
>
> If Coleen is okay with this as-is I'll commit it in the morning.
>
>> About the suggested changes:
>>
>> - Merging both APIs into one, like suggested, is fine for me and I would
>> do this in a follow-up.
>
> Sure.
>
>> - Handling unknown errno values: I do not like the idea of a fallback to
>> ::strerror(). Writing the API comment on that one would be weird: "Is
>> almost threadsafe", "will probably return a static constant string"...
>> :) I would suggest either using the solution as it is now - only
>> honouring POSIX errno values or returning "Unknown", or to expand them
>> with OS specific lists, either in the OS independent form ("#ifdef EBLUB
>> handle(EBLUB) #endif") or by expanding os::strerror() with platform
>> dependend os::strerror_pd().
>>
>> I am pretty unemotional about either option but would prefer avoiding
>> ::strerror() altogether.
>
> A follow up on this okay. I'm unsure how best to handle it. Given how
> rarely we encounter errors I'm probably over-thinking it. ;)
>
Yeah, I think you're over-thinking it. The two APIs with comments make
sense to me, and something returning Unknown isn't a big deal in real life.
Thanks for the change and thanks for sponsoring this, David.
Coleen
> Thanks,
> David
>
>> Kind Regards, Thomas
>>
>>
>>
>> On Mon, Mar 14, 2016 at 4:43 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>> On 12/03/2016 6:22 PM, Thomas Stüfe wrote:
>>
>> Hi David,
>>
>> thanks again for your review. Here the new version:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.02/webrev/
>>
>> And the diff to the last version:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01-to-02.txt
>>
>> I fixed the indention, copyright comments and the typo.
>>
>>
>> All looks good - thanks.
>>
>> Further comments inline.
>>
>>
>> Ditto :)
>>
>>
>> On Fri, Mar 11, 2016 at 8:04 AM, David Holmes
>> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>> wrote:
>>
>> Hi Thomas,
>>
>> Thanks for persevering with this. Sorry for the delayed
>> response.
>>
>> On 10/03/2016 2:58 AM, Thomas Stüfe wrote:
>>
>> Hi all,
>>
>> please take a look at this next version of the fix.
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01/webrev/
>>
>> I am the first to admit that this is quite ugly :(, but
>> it is a
>> pragmatic solution.
>>
>> Some details:
>>
>> - I added os::strerror() as a drop-in replacement for
>> ::strerror() - it
>> returns the text as before, but does not localize and
>> does not
>> handle
>> unknown error values; so it can be implemented using a
>> simple static
>> string table and is always safe to use.
>>
>>
>> Ok. But I don't think we need os::lasterror() if we have
>> this. A
>> follow up clean-up is okay for that though.
>>
>>
>> I would love to keep this functionality, because I like printing
>> out the
>> short concise errno values, and there are a number of logging
>> changes
>> coming at least for AIX. It also does not cost much, as the
>> large string
>> table is already there.
>>
>> If the second API does not suit you, would it be accepable for
>> you if I
>> hide this functionality behind a second parameter in strerror,
>> defaulting to false, like I did in my first version? E.g.
>> os::strerror(int e, bool short_name=false); ?
>>
>>
>> That was what I was thinking of yes. :)
>>
>> But talking about clean-up, could we get rid of
>> os::lasterror() ? I
>> think that function is a bad idea, and it fortunately is only
>> called in
>> two places, both just need strerror() and no Win32 error
>> handling anyway.
>>
>>
>> Sounds good. lasterror() is a fragile API as it is too easy to
>> introduce unexpected system calls between the call to lasterror and
>> the action you want to check for the error.
>>
>> I also would like to remove os::get_last_error(). It returns
>> errno on
>> *nix, win32 error on Windows. Both are different things, plus
>> there is
>> also an errno on Windows The return value of this function is
>> useless
>> without knowing what the returned error number actually is, so
>> it is not
>> of much use.
>>
>>
>> Simplifying and streamlining all this sounds good.
>>
>> Also it does seem to handle unknown error values.
>>
>> - I added os::errno_name() for those cases where one
>> would want the
>> literalized errno value (e.g. "EINVAL") instead of the
>> full text.
>>
>>
>> Ok. For the unknown case can it return the numeric value
>> that it was
>> passed?
>>
>>
>> I would like to avoid this. This would make the function more
>> complicated.
>>
>> I'd need to keep this thread-safe, so I either would have to
>> allocate
>> thread local storage for the return string somehow or,
>> alternatively,
>> keep a fixed string table a la [][] {"unknown (0)",
>> "unknown(1)"} for
>> the first n values or so.
>>
>>
>> Okay. I was thinking this would be simpler but of course we have to
>> convert to a string.
>>
>> I note that linux defines ~133 errno values, and you have entries
>> for ~75 - are they all the POSIX ones?
>>
>> I'm wondering whether we should have a fallback strategy for an
>> unknown (platform-specific) error by using the platform strerror
>> even though not thread-safe (the window of exposure is vastly
>> reduced) ?
>>
>> Also, ::strerror() behaves not consistently for the EINVAL case
>> on all
>> platforms anyway - on Linux and Bsd, it returns the numerical
>> value, but
>> on Solaris, Windows and AIX, it just returns "Unknown". Many
>> callsites I
>> saw take this already into account and print out the
>> numerical errno
>> value in addition to the error string.
>>
>>
>> I doubt those callsites are not taking into account unknown values -
>> I think you will find they originally only reported the numerical
>> value and were then augmented to also print the strerror message.
>>
>> Thanks,
>> David
>>
>> - I fixed all callsites in the hotspot which used
>> ::strerror to use
>> os::strerror, or, in some few cases (mostly my own
>> coding for AIX),
>> os::errno_name().
>>
>>
>> Bit subjective but okay - I personally don't care either
>> way. :)
>>
>> - To avoid including os.hpp in debug.hpp, I moved the
>> implemntation of
>> assert_status to debug.cpp
>>
>>
>> Okay ... did the include cause a problem?
>>
>>
>> os.hpp is a large beast. I wanted to keep it out of debug.hpp,
>> to not
>> force this include upon anyone who includes debug.hpp just to
>> get
>> assert(). I also try to avoid calling system apis in header
>> files.
>>
>>
>>
>> General minor comments:
>>
>> src/os/posix/vm/os_posix.cpp
>>
>> Indentation of wrapped call needs fixing
>>
>>
>> Fixed
>>
>> ---
>>
>> src/share/vm/runtime/os.hpp
>>
>> Typo:
>>
>> + // sugested in the POSIX standard.
>>
>> -> suggested
>>
>>
>> Fixed
>>
>> ---
>>
>> src/share/vm/utilities/vmError.cpp
>>
>> Good catch on the code that potentially reports the
>> wrong error
>> using get_last_error! Even the simple print_cr(...,
>> strerror(errno),
>> errno) could in fact print non-matching values if strerror
>> itself
>> caused an error (unlikely of course :) ).
>>
>>
>> Thanks :)
>>
>>
>> Please check all copyright years -
>> attachListener_solaris.cpp was
>> updated to 2015 instead of 2016 :) Also note correct format
>> for two
>> year copyrights - in src/share/vm/logging/logFileOutput.cpp
>> you have:
>>
>> Copyright (c) 2015,2016 Oracle
>>
>> but it should be:
>>
>> Copyright (c) 2015, 2016, Oracle
>>
>> Fixed
>>
>> Thanks,
>> David
>>
>>
>> Thanks!
>>
>> Thomas
>>
>>
>> Kind Regards, Thomas
>>
>>
>>
>>
>> On Tue, Mar 8, 2016 at 11:01 PM, Coleen Phillimore
>> <coleen.phillimore at oracle.com
>> <mailto:coleen.phillimore at oracle.com>
>> <mailto:coleen.phillimore at oracle.com
>> <mailto:coleen.phillimore at oracle.com>>
>> <mailto:coleen.phillimore at oracle.com
>> <mailto:coleen.phillimore at oracle.com>
>>
>> <mailto:coleen.phillimore at oracle.com
>> <mailto:coleen.phillimore at oracle.com>>>> wrote:
>>
>>
>> This latest version seems okay (or at least better
>> than the
>> other
>> workarounds).
>>
>> thanks,
>> Coleen
>>
>>
>> On 3/1/16 12:30 AM, David Holmes wrote:
>>
>> Hi Thomas,
>>
>> On 1/03/2016 3:23 AM, Thomas Stüfe wrote:
>>
>> Hi David,
>>
>> On Mon, Feb 29, 2016 at 4:44 AM, David
>> Holmes
>> <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>
>> <mailto:david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>
>>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>>>> wrote:
>>
>> Hi Thomas,
>>
>> On 27/02/2016 2:05 AM, Thomas Stüfe
>> wrote:
>>
>> Hi,
>>
>> please take a look at this
>> proposed fix:
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8148425
>> Webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.00/webrev/
>>
>> This adds a replacement function
>> os::strerror() as
>> a drop-in for
>> strerror(), which has a number of
>> issues.
>>
>> strerror() is unsafe to use and
>> may cause
>> (and has
>> caused)
>> crashes in
>> multithreaded scenarios. It is
>> also not
>> ideal for
>> log files
>> because of the
>> implicit localization of the
>> error messages.
>>
>> For details please see the
>> discussion
>> under the bug
>> report.
>>
>>
>> I just came across strerror_l,
>> which is
>> required to be
>> thread-safe.
>> Is that a possible alternative? (I'm
>> not sure how
>> locale's are
>> obtained).
>>
>>
>> Sorry, I think this API is glibc only. At
>> least I
>> cannot
>> find this in
>> our AIX headers, nor on Solaris.
>>
>>
>> It is a POSIX API:
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror_l.html
>>
>> added in 2006 but part of POSIX.1-2008. But
>> as per
>> Dalibor's
>> link not necessarily available:
>>
>> "This function is missing on some
>> platforms: glibc
>> 2.3.6, Mac OS
>> X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8,
>> Minix
>> 3.1.8, AIX
>> 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11
>> 2011-11,
>> Cygwin,
>> mingw, MSVC 9, Interix 3.5, BeOS. "
>>
>> Pity.
>>
>>
>> Otherwise what you have seems okay -
>> though I do
>> dislike having to
>> duplicate all that details already
>> buried in
>> the system
>> headers/library. Not sure we need to
>> the long
>> text at
>> the VM level -
>> which would simplify things a bit.
>>
>>
>> I agree, I dislike this too. Like everyone
>> else in this
>> thread. But I
>> think this is a pragmatic solution.
>>
>> I am a bit stuck here - should we really
>> get rid of
>> the long
>> text
>> feature? There are some callsites of
>> strerror() in the
>> hotspot where
>> arguably the long text is better suited:
>> - in assert_status() (see debug.hpp) -
>> this ends up
>> in the
>> header of
>> error files, if this suddenly changes to a
>> literalized
>> errno, people may
>> be upset
>>
>>
>> I added assert_status purely to expand on what
>> the failing
>> status was :) I'm happy to see EINVAL in that
>> context
>> rather
>> than "invalid argument" (or whatever) :)
>>
>> - when failing to write a heap dump
>> file - see
>> services/heapDumper.cpp.
>> Which ends as printout on the command
>> line, I think.
>>
>>
>> Maybe ... if it is an error the user can fix.
>>
>> The safe option would be to provide both
>> variants
>> (short and
>> long text).
>> Or, provide the safe, short variant for
>> all logging
>> calls - when
>> "EINVAL" is enough, and let users continue
>> to use
>> strerror()
>> for those
>> few cases where the long text is needed.
>>
>> What do you think?
>>
>>
>> Trying to decide what makes me least unhappy.
>> :) Given
>> you've
>> already done the work I guess having long and
>> short
>> forms is okay.
>>
>> Thanks,
>> David
>>
>> Thanks, Thomas
>>
>> Thanks,
>> David
>>
>>
>> Please note that I did not yet
>> change any call
>> sites, although
>> all call
>> sites in the os namespace should
>> already
>> use the
>> new function. I
>> wanted to
>> see whether there would be any
>> general
>> objections.
>>
>> Kind Regards, Thomas
>>
>>
>>
>>
>>
>>
More information about the hotspot-runtime-dev
mailing list