RFR: 8297963: Partially fix string expansion issues in the autoconf UTIL macros [v3]
Julian Waters
jwaters at openjdk.org
Fri Dec 2 14:04:17 UTC 2022
On Fri, 2 Dec 2022 13:30:55 GMT, Magnus Ihse Bursie <ihse at openjdk.org> wrote:
>> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
>>
>> - Merge branch 'openjdk:master' into patch-2
>> - FLAGS_COMPILER_CHECK_ARGUMENTS should not quote ARG_ arguments
>> - Remove special ARG_ handling in UTIL_DEFUN_NAMED
>> - Wording
>> - Partially fix string expansion issues in autoconf UTIL macros
>
> make/autoconf/util.m4 line 71:
>
>> 69: # that you need to dive into it to fix anything
>> 70: # ~Julian
>> 71: m4_foreach([arg], m4_dquote(m4_dquote_elt($3)), [
>
> I actually had to check up what `m4_dquote` does. It is ["meant for hard-core M4 programmers."](https://www.gnu.org/software/autoconf/manual/autoconf-2.60/html_node/Evaluation-Macros.html). No sh*t...
>
> I think your changes are okay, but it's hard to say. I can't read this kind of m4 incantations fluently. I assume you have tested this thoroughly?
>
> My only worry here is that you seem to be adding many layers of quoting (two cases of nested `m4_dquote`, each of which adds "double" quoting) -- will m4 really be able to properly "dequote" this, so there won't be some superfluous `[ ... ]` suddenly, where there shouldn't be?
I've tested this quite extensively before the final draft of his change and can verify that no extra quotes are added, but I can explain what each level of quoting does, so you don't have to go through the absolute nightmare of trying to decipher them like I did:
$3 is initially a comma separated string list of quoted arguments, the `m4_dquote_elt` call wraps each one in another layer of quoting, which prevents the `m4_dquote` immediately after this from expanding each one of them. When called on a comma separated string `m4_dquote` acts the same as [$3], but each element in the list has 2 layers of extra quoting, which is what we want.
The foreach call now immediately strips one level of quoting away from each element. This can be confirmed with an `m4_dumpdef` in the first line of the foreach code block. So far so good.
Now comes the check for if the parameter name was correctly separated from the values. If it was, our arg still has 1 extra quoting layer, which is good. If not however, arg is nested inside another `m4_dquote` before the patsubst; It immediately expands and loses that quoted layer, which the call to `m4_dquote` immediately restores. It then loses that quote in patsubst again, so another `m4_dquote` is invoked on it to get that layer of quoting back. All in all, whichever path was taken, we still have 1 extra layer of quoting, which then helps use breeze through the macro expansions afterwards, until `m4_pushdef` is encountered. Again arg loses the extra layer of quoting, and immediately has it restored by `m4_dquote`, bringing it back to 1 extra layer of quoting. A second `m4_dquote` after this now brings it to 2 extra levels of quoting. Perfect! The 2 `m4_bpatsubst` calls after this use up exactly 2 levels of quoting, and just like that, we've managed to pass the exact value we've
received into the ARG_ macros with the exact same level of quoting, as if it were never touched at all. A consequence of this (I don't know if you'd consider this good or bad) is that nested cells to macros implemented with UTIL_DEFUN_NAMED no longer need to quote their ARG_ parameters to the function being called, because macro expansions are now exact (This is reflected in the last push I made to this branch, in flags.m4, FLAGS_COMPILER_CHECK_ARGUMENTS)
Contrary to the name, if safely used `m4_dquote` only adds an additional layer of quoting for each call, which was the really infuriating part (`m4_quote` ignores quoted arguments entirely, forcing you to use dquote instead!)
I committed to this branch earlier just now, and one of the changes was removing a call to `m4_dumpdef([ARG_][]arg_name)` just after the `m4_pushdef` that I forgot about and left in, that was what I've been using to verify that the values passed to the ARG_ macros are ultimately correct
-------------
PR: https://git.openjdk.org/jdk/pull/11458
More information about the build-dev
mailing list