RFR(s): 8148425: strerror() function is not thread-safe
Thomas Stüfe
thomas.stuefe at gmail.com
Tue Mar 15 16:02:31 UTC 2016
Coleen, David,
thank you both!
I will do a follow up change later when there is more time.
Kind regards, Thomas
On Tue, Mar 15, 2016 at 4:02 PM, Coleen Phillimore <
coleen.phillimore at oracle.com> wrote:
>
> This looks fine to me, as-is.
>
> On 3/15/16 8:31 AM, David Holmes wrote:
>
>> Hi Thomas,
>>
>> On 15/03/2016 8:07 PM, Thomas Stüfe wrote:
>>
>>> Hi David,
>>>
>>> sorry, I will have not the time to work on this this week. If the patch
>>> is acceptable to you in this form, I'd say we commit it, otherwise I'll
>>> take it up again some time later.
>>>
>>
>> If Coleen is okay with this as-is I'll commit it in the morning.
>>
>> About the suggested changes:
>>>
>>> - Merging both APIs into one, like suggested, is fine for me and I would
>>> do this in a follow-up.
>>>
>>
>> Sure.
>>
>> - Handling unknown errno values: I do not like the idea of a fallback to
>>> ::strerror(). Writing the API comment on that one would be weird: "Is
>>> almost threadsafe", "will probably return a static constant string"...
>>> :) I would suggest either using the solution as it is now - only
>>> honouring POSIX errno values or returning "Unknown", or to expand them
>>> with OS specific lists, either in the OS independent form ("#ifdef EBLUB
>>> handle(EBLUB) #endif") or by expanding os::strerror() with platform
>>> dependend os::strerror_pd().
>>>
>>> I am pretty unemotional about either option but would prefer avoiding
>>> ::strerror() altogether.
>>>
>>
>> A follow up on this okay. I'm unsure how best to handle it. Given how
>> rarely we encounter errors I'm probably over-thinking it. ;)
>>
>>
> Yeah, I think you're over-thinking it. The two APIs with comments make
> sense to me, and something returning Unknown isn't a big deal in real life.
>
> Thanks for the change and thanks for sponsoring this, David.
>
> Coleen
>
>
> Thanks,
>> David
>>
>> Kind Regards, Thomas
>>>
>>>
>>>
>>> On Mon, Mar 14, 2016 at 4:43 AM, David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>> wrote:
>>>
>>> On 12/03/2016 6:22 PM, Thomas Stüfe wrote:
>>>
>>> Hi David,
>>>
>>> thanks again for your review. Here the new version:
>>>
>>>
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.02/webrev/
>>>
>>> And the diff to the last version:
>>>
>>>
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01-to-02.txt
>>>
>>> I fixed the indention, copyright comments and the typo.
>>>
>>>
>>> All looks good - thanks.
>>>
>>> Further comments inline.
>>>
>>>
>>> Ditto :)
>>>
>>>
>>> On Fri, Mar 11, 2016 at 8:04 AM, David Holmes
>>> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>> <mailto:david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>>> wrote:
>>>
>>> Hi Thomas,
>>>
>>> Thanks for persevering with this. Sorry for the delayed
>>> response.
>>>
>>> On 10/03/2016 2:58 AM, Thomas Stüfe wrote:
>>>
>>> Hi all,
>>>
>>> please take a look at this next version of the fix.
>>>
>>>
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.01/webrev/
>>>
>>> I am the first to admit that this is quite ugly :(, but
>>> it is a
>>> pragmatic solution.
>>>
>>> Some details:
>>>
>>> - I added os::strerror() as a drop-in replacement for
>>> ::strerror() - it
>>> returns the text as before, but does not localize and
>>> does not
>>> handle
>>> unknown error values; so it can be implemented using a
>>> simple static
>>> string table and is always safe to use.
>>>
>>>
>>> Ok. But I don't think we need os::lasterror() if we have
>>> this. A
>>> follow up clean-up is okay for that though.
>>>
>>>
>>> I would love to keep this functionality, because I like printing
>>> out the
>>> short concise errno values, and there are a number of logging
>>> changes
>>> coming at least for AIX. It also does not cost much, as the
>>> large string
>>> table is already there.
>>>
>>> If the second API does not suit you, would it be accepable for
>>> you if I
>>> hide this functionality behind a second parameter in strerror,
>>> defaulting to false, like I did in my first version? E.g.
>>> os::strerror(int e, bool short_name=false); ?
>>>
>>>
>>> That was what I was thinking of yes. :)
>>>
>>> But talking about clean-up, could we get rid of os::lasterror()
>>> ? I
>>> think that function is a bad idea, and it fortunately is only
>>> called in
>>> two places, both just need strerror() and no Win32 error
>>> handling anyway.
>>>
>>>
>>> Sounds good. lasterror() is a fragile API as it is too easy to
>>> introduce unexpected system calls between the call to lasterror and
>>> the action you want to check for the error.
>>>
>>> I also would like to remove os::get_last_error(). It returns
>>> errno on
>>> *nix, win32 error on Windows. Both are different things, plus
>>> there is
>>> also an errno on Windows The return value of this function is
>>> useless
>>> without knowing what the returned error number actually is, so
>>> it is not
>>> of much use.
>>>
>>>
>>> Simplifying and streamlining all this sounds good.
>>>
>>> Also it does seem to handle unknown error values.
>>>
>>> - I added os::errno_name() for those cases where one
>>> would want the
>>> literalized errno value (e.g. "EINVAL") instead of the
>>> full text.
>>>
>>>
>>> Ok. For the unknown case can it return the numeric value
>>> that it was
>>> passed?
>>>
>>>
>>> I would like to avoid this. This would make the function more
>>> complicated.
>>>
>>> I'd need to keep this thread-safe, so I either would have to
>>> allocate
>>> thread local storage for the return string somehow or,
>>> alternatively,
>>> keep a fixed string table a la [][] {"unknown (0)",
>>> "unknown(1)"} for
>>> the first n values or so.
>>>
>>>
>>> Okay. I was thinking this would be simpler but of course we have to
>>> convert to a string.
>>>
>>> I note that linux defines ~133 errno values, and you have entries
>>> for ~75 - are they all the POSIX ones?
>>>
>>> I'm wondering whether we should have a fallback strategy for an
>>> unknown (platform-specific) error by using the platform strerror
>>> even though not thread-safe (the window of exposure is vastly
>>> reduced) ?
>>>
>>> Also, ::strerror() behaves not consistently for the EINVAL case
>>> on all
>>> platforms anyway - on Linux and Bsd, it returns the numerical
>>> value, but
>>> on Solaris, Windows and AIX, it just returns "Unknown". Many
>>> callsites I
>>> saw take this already into account and print out the numerical
>>> errno
>>> value in addition to the error string.
>>>
>>>
>>> I doubt those callsites are not taking into account unknown values -
>>> I think you will find they originally only reported the numerical
>>> value and were then augmented to also print the strerror message.
>>>
>>> Thanks,
>>> David
>>>
>>> - I fixed all callsites in the hotspot which used
>>> ::strerror to use
>>> os::strerror, or, in some few cases (mostly my own
>>> coding for AIX),
>>> os::errno_name().
>>>
>>>
>>> Bit subjective but okay - I personally don't care either
>>> way. :)
>>>
>>> - To avoid including os.hpp in debug.hpp, I moved the
>>> implemntation of
>>> assert_status to debug.cpp
>>>
>>>
>>> Okay ... did the include cause a problem?
>>>
>>>
>>> os.hpp is a large beast. I wanted to keep it out of debug.hpp,
>>> to not
>>> force this include upon anyone who includes debug.hpp just to get
>>> assert(). I also try to avoid calling system apis in header
>>> files.
>>>
>>>
>>>
>>> General minor comments:
>>>
>>> src/os/posix/vm/os_posix.cpp
>>>
>>> Indentation of wrapped call needs fixing
>>>
>>>
>>> Fixed
>>>
>>> ---
>>>
>>> src/share/vm/runtime/os.hpp
>>>
>>> Typo:
>>>
>>> + // sugested in the POSIX standard.
>>>
>>> -> suggested
>>>
>>>
>>> Fixed
>>>
>>> ---
>>>
>>> src/share/vm/utilities/vmError.cpp
>>>
>>> Good catch on the code that potentially reports the wrong
>>> error
>>> using get_last_error! Even the simple print_cr(...,
>>> strerror(errno),
>>> errno) could in fact print non-matching values if strerror
>>> itself
>>> caused an error (unlikely of course :) ).
>>>
>>>
>>> Thanks :)
>>>
>>>
>>> Please check all copyright years -
>>> attachListener_solaris.cpp was
>>> updated to 2015 instead of 2016 :) Also note correct format
>>> for two
>>> year copyrights - in src/share/vm/logging/logFileOutput.cpp
>>> you have:
>>>
>>> Copyright (c) 2015,2016 Oracle
>>>
>>> but it should be:
>>>
>>> Copyright (c) 2015, 2016, Oracle
>>>
>>> Fixed
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> Thanks!
>>>
>>> Thomas
>>>
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>>
>>> On Tue, Mar 8, 2016 at 11:01 PM, Coleen Phillimore
>>> <coleen.phillimore at oracle.com
>>> <mailto:coleen.phillimore at oracle.com>
>>> <mailto:coleen.phillimore at oracle.com
>>> <mailto:coleen.phillimore at oracle.com>>
>>> <mailto:coleen.phillimore at oracle.com
>>> <mailto:coleen.phillimore at oracle.com>
>>>
>>> <mailto:coleen.phillimore at oracle.com
>>> <mailto:coleen.phillimore at oracle.com>>>> wrote:
>>>
>>>
>>> This latest version seems okay (or at least better
>>> than the
>>> other
>>> workarounds).
>>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> On 3/1/16 12:30 AM, David Holmes wrote:
>>>
>>> Hi Thomas,
>>>
>>> On 1/03/2016 3:23 AM, Thomas Stüfe wrote:
>>>
>>> Hi David,
>>>
>>> On Mon, Feb 29, 2016 at 4:44 AM, David
>>> Holmes
>>> <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>
>>> <mailto:david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>>
>>> <mailto:david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>> <mailto:david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>>>
>>> <mailto:david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>
>>> <mailto:david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>>
>>>
>>> <mailto:david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>
>>> <mailto:david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>>>>> wrote:
>>>
>>> Hi Thomas,
>>>
>>> On 27/02/2016 2:05 AM, Thomas Stüfe
>>> wrote:
>>>
>>> Hi,
>>>
>>> please take a look at this
>>> proposed fix:
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8148425
>>> Webrev:
>>>
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8141425-strerror-replacement/webrev.00/webrev/
>>>
>>> This adds a replacement function
>>> os::strerror() as
>>> a drop-in for
>>> strerror(), which has a number of
>>> issues.
>>>
>>> strerror() is unsafe to use and
>>> may cause
>>> (and has
>>> caused)
>>> crashes in
>>> multithreaded scenarios. It is
>>> also not
>>> ideal for
>>> log files
>>> because of the
>>> implicit localization of the
>>> error messages.
>>>
>>> For details please see the
>>> discussion
>>> under the bug
>>> report.
>>>
>>>
>>> I just came across strerror_l, which
>>> is
>>> required to be
>>> thread-safe.
>>> Is that a possible alternative? (I'm
>>> not sure how
>>> locale's are
>>> obtained).
>>>
>>>
>>> Sorry, I think this API is glibc only. At
>>> least I
>>> cannot
>>> find this in
>>> our AIX headers, nor on Solaris.
>>>
>>>
>>> It is a POSIX API:
>>>
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror_l.html
>>>
>>> added in 2006 but part of POSIX.1-2008. But as
>>> per
>>> Dalibor's
>>> link not necessarily available:
>>>
>>> "This function is missing on some platforms:
>>> glibc
>>> 2.3.6, Mac OS
>>> X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8,
>>> Minix
>>> 3.1.8, AIX
>>> 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11
>>> 2011-11,
>>> Cygwin,
>>> mingw, MSVC 9, Interix 3.5, BeOS. "
>>>
>>> Pity.
>>>
>>>
>>> Otherwise what you have seems okay -
>>> though I do
>>> dislike having to
>>> duplicate all that details already
>>> buried in
>>> the system
>>> headers/library. Not sure we need to
>>> the long
>>> text at
>>> the VM level -
>>> which would simplify things a bit.
>>>
>>>
>>> I agree, I dislike this too. Like everyone
>>> else in this
>>> thread. But I
>>> think this is a pragmatic solution.
>>>
>>> I am a bit stuck here - should we really
>>> get rid of
>>> the long
>>> text
>>> feature? There are some callsites of
>>> strerror() in the
>>> hotspot where
>>> arguably the long text is better suited:
>>> - in assert_status() (see debug.hpp) -
>>> this ends up
>>> in the
>>> header of
>>> error files, if this suddenly changes to a
>>> literalized
>>> errno, people may
>>> be upset
>>>
>>>
>>> I added assert_status purely to expand on what
>>> the failing
>>> status was :) I'm happy to see EINVAL in that
>>> context
>>> rather
>>> than "invalid argument" (or whatever) :)
>>>
>>> - when failing to write a heap dump file -
>>> see
>>> services/heapDumper.cpp.
>>> Which ends as printout on the command
>>> line, I think.
>>>
>>>
>>> Maybe ... if it is an error the user can fix.
>>>
>>> The safe option would be to provide both
>>> variants
>>> (short and
>>> long text).
>>> Or, provide the safe, short variant for
>>> all logging
>>> calls - when
>>> "EINVAL" is enough, and let users continue
>>> to use
>>> strerror()
>>> for those
>>> few cases where the long text is needed.
>>>
>>> What do you think?
>>>
>>>
>>> Trying to decide what makes me least unhappy.
>>> :) Given
>>> you've
>>> already done the work I guess having long and
>>> short
>>> forms is okay.
>>>
>>> Thanks,
>>> David
>>>
>>> Thanks, Thomas
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> Please note that I did not yet
>>> change any call
>>> sites, although
>>> all call
>>> sites in the os namespace should
>>> already
>>> use the
>>> new function. I
>>> wanted to
>>> see whether there would be any
>>> general
>>> objections.
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list