P.S.: RFR [9] 8133651: automated replacing of old-style tags in docs

Martin Buchholz martinrb at google.com
Thu Oct 1 06:54:15 UTC 2015


Hi s'marks,
You probably don't need to absolutify paths.
And you can easily handle multiple args.

(just for fun!)
Checks for javadoc comment; handles popular html entities; handles multiple
lines; handles both tt and code:

#!/bin/bash
find "$@" -name '*.java' | \
  xargs -r perl -p0777i -e \
  'do {} while s~^
*\*.*\K<(tt|code)>((?:[^<>{}\&\@]|&(?:lt|gt|amp);)*)</\1>~$_=$2;
s/</</g; s/>/>/g; s/&/&/g; "{\@code $_}"~mgie'


On Wed, Sep 30, 2015 at 6:16 PM, Stuart Marks <stuart.marks at oracle.com>
wrote:

> Hi Alexander, Martin,
>
> The challenge of Perl file slurping and Emacs one-liners was too much to
> bear.
>
> This is Java, so one-liners are hardly possible. Still, there are a bunch
> of improvements that can be made to the Java version. (OK, and I'm showing
> off a bit.)
>
> Take a look at this:
>
> http://cr.openjdk.java.net/~smarks/misc/SimpleTagEditorSmarks1.java
>
> I haven't studied the output exhaustively, but it seems to do a reasonably
> good job for the common cases. I ran it over java.lang and I noticed a few
> cases where there is markup embedded within <code></code> text, which
> should be looked at more closely.
>
> I don't particularly care if you use my version, but there are some
> techniques that I'd strongly recommend that you consider using in any such
> tool. In particular:
>
>  - Pattern.DOTALL to do multi-line matches
>  - Pattern.CASE_INSENSITIVE
>  - try-with-resources to ensure that files are closed properly
>  - NIO instead of old java.io APIs, particularly Files.walk() and streams
>  - use Scanner to deal with input file buffering
>  - Scanner's stream support (I recently added this to JDK 9)
>
> Enjoy,
>
> s'marks
>
>
>
> On 9/29/15 2:23 PM, Martin Buchholz wrote:
>
>> Hi Alexander,
>>
>> your change looks good.  It's OK to have manual corrections for automated
>> mega-changes like this, as long as they all revert changes.
>>
>> Random comments:
>>
>> Should you publish your specdiff?  I guess not - it would be empty!
>>
>>              while((s = br.readLine()) != null) {
>>
>> by matching only one line at a time, you lose the ability to make
>> replacements that span lines.  Perlers like to "slurp" in the entire file
>> as a single string.
>>
>>          s = s.replace( "<CODE>", tag1);
>>          s = s.replace( "<Code>", tag1);
>>          s = s.replace("</CODE>", tag2);
>>          s = s.replace("</Code>", tag2);
>>
>> Why not use case-insensitive regex?
>>
>> Here's an emacs-lisp one-liner I've been known to use:
>>
>> (defun tt-code ()
>>    (interactive)
>>    (query-replace-regexp "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)</\\1>"
>> "{@code
>> \\2}"))
>>
>> With more work, one can automate transformation of embedded things like
>> <
>>
>> But of course, it's not even possible to transform ALL uses of <code> to
>> {@code, if there was imaginative use of nested html tags.
>>
>>
>> On Tue, Sep 29, 2015 at 3:21 AM, Alexander Stepanov <
>> alexander.v.stepanov at oracle.com> wrote:
>>
>> Updated: a few manual corrections were made (as @linkplain tags displays
>>> nested {@code } literally):
>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
>>> -checked with specdiff (which of course does not cover documentation for
>>> internal packages), no unexpected diffs detected.
>>>
>>> Regards,
>>> Alexander
>>>
>>>
>>> On 9/27/2015 4:52 PM, Alexander Stepanov wrote:
>>>
>>> Hello Martin,
>>>>
>>>> Here is some simple app. to replace <code></code> tags with a new-style
>>>> {@code } one (which is definitely not so elegant as the Perl
>>>> one-liners):
>>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/SimpleTagEditor.java
>>>>
>>>> Corresponding patch for jdk and replacement log (~62k of the tag
>>>> changes):
>>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
>>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/replace.log
>>>> (sorry, I have to check the correctness of the patch with specdiff yet,
>>>> so this is rather demo at the moment).
>>>>
>>>> Don't know if these changes (cosmetic by nature) are desired for now or
>>>> not. Moreover, probably some part of them should go to another repos
>>>> (e.g.,
>>>> awt, swing -> "client" instead of "dev").
>>>>
>>>> Regards,
>>>> Alexander
>>>>
>>>>
>>>>
>>>> ----- Исходное сообщение -----
>>>> От: alexander.v.stepanov at oracle.com
>>>> Кому: martinrb at google.com
>>>> Копия: core-libs-dev at openjdk.java.net
>>>> Отправленные: Четверг, 24 Сентябрь 2015 г 16:06:56 GMT +03:00 Москва,
>>>> Санкт-Петербург, Волгоград
>>>> Тема: Re: RFR [9] 8133651: replace some <tt> tags (obsolete in html5) in
>>>> core-libs docs
>>>>
>>>> Hello Martin,
>>>>
>>>> Thank you for review and for the notes!
>>>>
>>>>    > I'm biased of course, but I like the approach I took with
>>>> blessed-modifier-order:
>>>>    > - make the change completely automated
>>>>    > - leave "human editing" for a separate change
>>>>    > - publish the code used to make the automated change (in my case,
>>>> typically a perl one-liner)
>>>>
>>>> Automated replacement has an obvious advantage: it is fast and massive.
>>>> But there are some disadvantages at the same time (just IMHO).
>>>>
>>>> Using script it is quite easy to miss some not very trivial cases, e.g.:
>>>> - remove unnecessary linebreaks, like
>>>>     * <tt>someCode
>>>>     * </tt>
>>>> (which would be better to replace with single-line {@code someCode};
>>>> - joining of successive terms, like "<tt>ONE</tt>, <tt>TWO</tt>,
>>>> <tt>THREE</tt>" -> "{@code ONE, TWO, THREE}";
>>>> - errors like extra or missing "<" or ">": * <tt>Collection
>>>> <T></tt>", - there were a lot of them;
>>>> - some cases when <tt></tt> should be replaced with <code></code>, not
>>>> {@code } (e.g. because of unicode characters inside of code etc.);
>>>> - extra tags inside of <tt> or <code> which should be moved outside of
>>>> {@code }, like <tt><i>someCode</i></tt> or <tt><b>someCode</b></tt>;
>>>> - simple removing of needless tags, like "<tt>{@link ...}</tt>" ->
>>>> "{@link ...}";
>>>> - replace HTML codes with symbols ('<', '>', '@', ...)
>>>> - etc.
>>>> - plus some other formatting changes and fixes for misprints which would
>>>> be omitted during the automated replacement (and wouldn't be done in
>>>> future manually because there is no motivation for repeated processing).
>>>>
>>>> So sometimes it may be difficult to say where is the border between
>>>> "trivial" and "human-editing" cases (and the portion of "non-trivial
>>>> cases" is definitely not minor); moreover, even the automated
>>>> replacement requires the subsequent careful review before publishing of
>>>> webrev (as well as by reviewers who probably wouldn't be happy to review
>>>> hundreds of files at the same time) and iterative checks/corrections.
>>>> specdiff is very useful for this task but also cannot fully cover the
>>>> diffs (as some changes are situated in the internal com/... sun/...
>>>> packages).
>>>>
>>>> Moreover, I'm sure that some reviewers would be annoyed with the fact
>>>> that some (quite simple) changes were postponed because they are "not
>>>> too trivial to be fixed just now" (because they will suspect they would
>>>> be postponed forever). So the patch creator would (probably) receive
>>>> some advices during the review like "please fix also fix this and that"
>>>> (which is normal, of course).
>>>>
>>>> So my preference was to make the changes package by package (in some
>>>> reasonable amount of files) not postponing part of the changes for the
>>>> future (sorry for these boring repeating review requests). Please note
>>>> that all the above mentioned is *rather explanation of my motivation
>>>> than objection* :) (and of course I used some text editor replace
>>>> automation which is surely not so advanced as Perl).
>>>>
>>>>    > It's probably correct, but I would have left it out of this change
>>>> Yes, I see. Reverted (please update the web page):
>>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/index.html
>>>>
>>>> Thanks,
>>>> Alexander
>>>>
>>>> P.S. The <tt> replacement job is mostly (I guess, ~80%) complete. But
>>>> probably this approach should be used if some similar replacement task
>>>> for, e.g., <code></code> tags would be planned in future (there are
>>>> thousands of them).
>>>>
>>>>
>>>> On 9/24/2015 6:10 AM, Martin Buchholz wrote:
>>>>
>>>>
>>>>> On Sat, Sep 19, 2015 at 6:58 AM, Alexander Stepanov
>>>>> <alexander.v.stepanov at oracle.com
>>>>> <mailto:alexander.v.stepanov at oracle.com>> wrote:
>>>>>
>>>>>       Hello,
>>>>>
>>>>>       Could you please review the following fix
>>>>>       http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/
>>>>>       <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>>>>>       http://cr.openjdk.java.net/~avstepan/8133651/jaxws.00/index.html
>>>>>       <
>>>>> http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html
>>>>>
>>>>>>
>>>>>>       for
>>>>>       https://bugs.openjdk.java.net/browse/JDK-8133651
>>>>>
>>>>>       Just another portion of deprecated <tt> (and <xmp>) tags replaced
>>>>>       with {@code }. Some misprints were also fixed.
>>>>>
>>>>>
>>>>> I'm biased of course, but I like the approach I took with
>>>>> blessed-modifier-order:
>>>>> - make the change completely automated
>>>>> - leave "human editing" for a separate change
>>>>> - publish the code used to make the automated change (in my case,
>>>>> typically a perl one-liner)
>>>>>
>>>>>
>>>>>       The following (expected) changes were detected by specdiff:
>>>>>       - removed needless dashes in java.util.Locale,
>>>>>       - removed needless curly brace in
>>>>> xml.bind.annotation.XmlElementRef
>>>>>
>>>>>
>>>>> I would do a separate automated "removed needless dashes" changeset.
>>>>>
>>>>>
>>>>>       Please let me know if the following changes are desirable or not:
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>>>>>       <
>>>>>
>>>>> http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> This is an actual change to the behavior of this code - the
>>>>> maintainers of jconsole need to approve it.  It's probably correct,
>>>>> but I would have left it out of this change. If you remove it, then I
>>>>> approve this change.
>>>>>
>>>>>
>>>>
>>>



More information about the core-libs-dev mailing list