RFR: 8329961: Buffer overflow in os::Linux::kernel_version [v3]

Wed Apr 10 14:06:12 UTC 2024

On Wed, 10 Apr 2024 13:30:12 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

> > > > > > Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.
> > > > > > ```
> > > > > > if (sscanf(release, "%d.%d", &major, &minor) != 2) {
> > > > > >   log_warning blabla
> > > > > > }
> > > > > > ```
> > > > > > 
> > > > > > 
> > > > > >     
> > > > > >       
> > > > > >     
> > > > > > 
> > > > > >       
> > > > > >     
> > > > > > 
> > > > > >     
> > > > > >   
> > > > > > As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.
> > > > > 
> > > > > 
> > > > > Hi, according to C11 standard (and my man pages) it is UB to call the scanf-family of functions with "invalid" data and `strtol` is recommended instead. So, unfortunately, it might not be safer.
> > > > 
> > > > 
> > > > Can you point me to the relevant man page? If that were not to work, it would drastically limit the usefulness of scanf, and a lot of code in hotspot would be UB.
> > > > My manpage:
> > > > ```
> > > > The format string consists of a sequence of directives which describe how to process the sequence of input characters. If processing of a directive fails, no further input is read, and scanf() returns. A "failure" can be either of the following: input failure, meaning that input characters were unavailable, or matching failure, meaning that the input was inappropriate (see below). 
> > > > ```
> > > > 
> > > > 
> > > >     
> > > >       
> > > >     
> > > > 
> > > >       
> > > >     
> > > > 
> > > >     
> > > >   
> > > > Matching failures are part of the API.
> > > > > Another thing: We shouldn't call `uname`. Reading `/proc/sys/kernel/osrelease` is sufficient.
> > > > 
> > > > 
> > > > I agree with @robehn, uname is Posix and portable. May also be cached inside glibc, saving us a proc fs read.
> > > 
> > > 
> > > See BUGS section: https://man7.org/linux/man-pages/man3/sscanf.3.html
> > 
> > 
> > Okay that. Note that deprecation of numerical conversions has been hugely controversial, and it was a step taken by glibc manpage maintainers. This is just about numerical overflow. E.g. 3333333333333333333333333333333.333333333333333333333333333333 will overflow int range, and return true. Are we honestly concerned about this here?
> > By avoiding sscanf, we swap very readable and simple-hence-safe code with manual parsing, which is both unreadable and, as this issue shows, in practice a lot unsafer than had we used sscanf to begin with.
> > > OK, I'm probably too paranoid regarding `uname`. I don't like that we've got a C string and there's no way of knowing whether it actually ends in a NUL byte or not. At least with a file you know that it has a specific length and can account for that. We probably have to assume that if someone is being malicious with `uname` then we're compromised regardless.
> > 
> > 
> > If you are paranoid, do this:
> > ```
> > struct {
> >   utsname uts; char c;
> > } v;
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > then set v.c to 0 before calling uname.
> 
> Aha, it's just overflow issues? Well, then let's just use `sscanf`. Thanks for the hint, interesting code pattern! That won't work as the string isn't necessarily embedded within the struct, but it's an interesting strategy nonetheless.

I think it is required to be inline, though. See https://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/utsname.h.html , definition is to be an array, not a pointer:

char  release[]  Current release level of this implementation. 

> I'll just trust that it ends at some point.

It has to. User address space is ending at 128TB usually. :-)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18697#issuecomment-2047639071