RFR: 8297963: Partially fix string expansion issues in the autoconf UTIL macros [v3]
Magnus Ihse Bursie
ihse at openjdk.org
Fri Dec 2 14:37:21 UTC 2022
On Fri, 2 Dec 2022 14:01:42 GMT, Julian Waters <jwaters at openjdk.org> wrote:
>> make/autoconf/util.m4 line 71:
>>
>>> 69: # that you need to dive into it to fix anything
>>> 70: # ~Julian
>>> 71: m4_foreach([arg], m4_dquote(m4_dquote_elt($3)), [
>>
>> I actually had to check up what `m4_dquote` does. It is ["meant for hard-core M4 programmers."](https://www.gnu.org/software/autoconf/manual/autoconf-2.60/html_node/Evaluation-Macros.html). No sh*t...
>>
>> I think your changes are okay, but it's hard to say. I can't read this kind of m4 incantations fluently. I assume you have tested this thoroughly?
>>
>> My only worry here is that you seem to be adding many layers of quoting (two cases of nested `m4_dquote`, each of which adds "double" quoting) -- will m4 really be able to properly "dequote" this, so there won't be some superfluous `[ ... ]` suddenly, where there shouldn't be?
>
> I've tested this quite extensively before the final draft of his change and can verify that no extra quotes are added, but I can explain what each level of quoting does, so you don't have to go through the absolute nightmare of trying to decipher them like I did:
>
> $3 is initially a comma separated string list of quoted arguments, the `m4_dquote_elt` call wraps each one in another layer of quoting, which prevents the `m4_dquote` immediately after this from expanding each one of them. When called on a comma separated string `m4_dquote` acts the same as [$3], but each element in the list has 2 layers of extra quoting, which is what we want.
>
> The foreach call now immediately strips one level of quoting away from each element. This can be confirmed with an `m4_dumpdef` in the first line of the foreach code block. So far so good.
>
> Now comes the check for if the parameter name was correctly separated from the values. If it was, our arg still has 1 extra quoting layer, which is good. If not however, arg is nested inside another `m4_dquote` before the patsubst; It immediately expands and loses that quoted layer, which the call to `m4_dquote` immediately restores. It then loses that quote in patsubst again, so another `m4_dquote` is invoked on it to get that layer of quoting back. All in all, whichever path was taken, we still have 1 extra layer of quoting, which then helps use breeze through the macro expansions afterwards, until `m4_pushdef` is encountered. Again arg loses the extra layer of quoting, and immediately has it restored by `m4_dquote`, bringing it back to 1 extra layer of quoting. A second `m4_dquote` after this now brings it to 2 extra levels of quoting. Perfect! The 2 `m4_bpatsubst` calls after this use up exactly 2 levels of quoting, and just like that, we've managed to pass the exact value we'v
e received into the ARG_ macros with the exact same level of quoting, as if it were never touched at all. A consequence of this (I don't know if you'd consider this good or bad) is that nested cells to macros implemented with UTIL_DEFUN_NAMED no longer need to quote their ARG_ parameters to the function being called, because macro expansions are now exact (This is reflected in the last push I made to this branch, in flags.m4, FLAGS_COMPILER_CHECK_ARGUMENTS)
>
> Contrary to the name, if safely used `m4_dquote` only adds an additional layer of quoting for each call, which was the really infuriating part (`m4_quote` ignores quoted arguments entirely, forcing you to use dquote instead!)
>
> I committed to this branch earlier just now, and one of the changes was removing a call to `m4_dumpdef([ARG_][]arg_name)` just after the `m4_pushdef` that I forgot about and left in, that was what I've been using to verify that the values passed to the ARG_ macros are ultimately correct
Thank you! That was a very comprehensive explanation. I think that (rather than the more poetically apt soul-eating demon story) would actually be helpful to have in the comments. But since it is quite long, maybe you can just add a pointer to the canonicalized URL to this post?
`# For details on how this work, see https://git.openjdk.org/jdk/pull/11458#discussion_r1038173051`
or something like that.
-------------
PR: https://git.openjdk.org/jdk/pull/11458
More information about the build-dev
mailing list