P.S.: RFR [9] 8133651: automated replacing of old-style tags in docs

Alexander Stepanov alexander.v.stepanov at oracle.com
Thu Oct 1 13:34:27 UTC 2015


Hello Lance,

Yes, to be honest, I also fear to push at once a single changeset for 
jdk touching thousands of files.

Moreover, I'm pretty sure that e.g. java.desktop part should go to a 
client repository instead of dev, changes for client and security libs 
should be approved by corresponding dev teams etc. Also, probably some 
of these cosmetic changes should be postponed for now to avoid potential 
merging issues for people who really do something valuable.

So I'll try to split the initial overall fix for jdk part into some 
reasonable amount of sub-tasks and send RFRs one by one, as earlier.

Thanks,
Alexander

On 10/1/2015 3:40 PM, Lance Andersen wrote:
> Hi Alexander,
>
> Personally, I would find it easier to review if the changes were sent 
> out by category/module vs one large patch as it is easier to miss things.
>
> Also for technologies such as jaxws, these will need to go to the 
> external workspace at the same time for the technology so these should 
> definitely be kept separate.
>
> Best
> Lance
> On Oct 1, 2015, at 7:31 AM, Alexander Stepanov 
> <alexander.v.stepanov at oracle.com 
> <mailto:alexander.v.stepanov at oracle.com>> wrote:
>
>>
>>
>> So if you don't have objections, I'll delay for a several days and 
>> then publish a final RFR (probably containing changes in some other 
>> repos like jaxws, corba or jaxp) which would be more formal 
>> (containing bug # and the final specdiff report).
>>
>> Thanks again,
>> Alexander
>>
>>
>> On 10/1/2015 9:54 AM, Martin Buchholz wrote:
>>> Hi s'marks,
>>> You probably don't need to absolutify paths.
>>> And you can easily handle multiple args.
>>>
>>> (just for fun!)
>>> Checks for javadoc comment; handles popular html entities; handles 
>>> multiple lines; handles both tt and code:
>>>
>>> #!/bin/bash
>>> find "$@" -name '*.java' | \
>>>  xargs -r perl -p0777i -e \
>>>  'do {} while s~^ 
>>> *\*.*\K<(tt|code)>((?:[^<>{}\&\@]|&(?:lt|gt|amp);)*)</\1>~$_=$2; 
>>> s/</</g; s/>/>/g; s/&/&/g; "{\@code $_}"~mgie'
>>>
>>>
>>> On Wed, Sep 30, 2015 at 6:16 PM, Stuart Marks 
>>> <stuart.marks at oracle.com <mailto:stuart.marks at oracle.com> 
>>> <mailto:stuart.marks at oracle.com>> wrote:
>>>
>>>    Hi Alexander, Martin,
>>>
>>>    The challenge of Perl file slurping and Emacs one-liners was too
>>>    much to bear.
>>>
>>>    This is Java, so one-liners are hardly possible. Still, there are
>>>    a bunch of improvements that can be made to the Java version. (OK,
>>>    and I'm showing off a bit.)
>>>
>>>    Take a look at this:
>>>
>>> http://cr.openjdk.java.net/~smarks/misc/SimpleTagEditorSmarks1.java 
>>> <http://cr.openjdk.java.net/%7Esmarks/misc/SimpleTagEditorSmarks1.java> 
>>> <http://cr.openjdk.java.net/%7Esmarks/misc/SimpleTagEditorSmarks1.java>
>>>
>>>    I haven't studied the output exhaustively, but it seems to do a
>>>    reasonably good job for the common cases. I ran it over java.lang
>>>    and I noticed a few cases where there is markup embedded within
>>>    <code></code> text, which should be looked at more closely.
>>>
>>>    I don't particularly care if you use my version, but there are
>>>    some techniques that I'd strongly recommend that you consider
>>>    using in any such tool. In particular:
>>>
>>>     - Pattern.DOTALL to do multi-line matches
>>>     - Pattern.CASE_INSENSITIVE
>>>     - try-with-resources to ensure that files are closed properly
>>>     - NIO instead of old java.io <http://java.io> APIs, particularly
>>>    Files.walk() and streams
>>>     - use Scanner to deal with input file buffering
>>>     - Scanner's stream support (I recently added this to JDK 9)
>>>
>>>    Enjoy,
>>>
>>>    s'marks
>>>
>>>
>>>
>>>    On 9/29/15 2:23 PM, Martin Buchholz wrote:
>>>
>>>        Hi Alexander,
>>>
>>>        your change looks good.  It's OK to have manual corrections
>>>        for automated
>>>        mega-changes like this, as long as they all revert changes.
>>>
>>>        Random comments:
>>>
>>>        Should you publish your specdiff?  I guess not - it would be
>>>        empty!
>>>
>>>                     while((s = br.readLine()) != null) {
>>>
>>>        by matching only one line at a time, you lose the ability to make
>>>        replacements that span lines.  Perlers like to "slurp" in the
>>>        entire file
>>>        as a single string.
>>>
>>>                 s = s.replace( "<CODE>", tag1);
>>>                 s = s.replace( "<Code>", tag1);
>>>                 s = s.replace("</CODE>", tag2);
>>>                 s = s.replace("</Code>", tag2);
>>>
>>>        Why not use case-insensitive regex?
>>>
>>>        Here's an emacs-lisp one-liner I've been known to use:
>>>
>>>        (defun tt-code ()
>>>           (interactive)
>>>           (query-replace-regexp
>>>        "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)</\\1>" "{@code
>>>        \\2}"))
>>>
>>>        With more work, one can automate transformation of embedded
>>>        things like <
>>>
>>>        But of course, it's not even possible to transform ALL uses of
>>>        <code> to
>>>        {@code, if there was imaginative use of nested html tags.
>>>
>>>
>>>        On Tue, Sep 29, 2015 at 3:21 AM, Alexander Stepanov <
>>> alexander.v.stepanov at oracle.com <mailto:alexander.v.stepanov at oracle.com>
>>>        <mailto:alexander.v.stepanov at oracle.com>> wrote:
>>>
>>>            Updated: a few manual corrections were made (as @linkplain
>>>            tags displays
>>>            nested {@code } literally):
>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch 
>>> <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch> 
>>> <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
>>>            -checked with specdiff (which of course does not cover
>>>            documentation for
>>>            internal packages), no unexpected diffs detected.
>>>
>>>            Regards,
>>>            Alexander
>>>
>>>
>>>            On 9/27/2015 4:52 PM, Alexander Stepanov wrote:
>>>
>>>                Hello Martin,
>>>
>>>                Here is some simple app. to replace <code></code> tags
>>>                with a new-style
>>>                {@code } one (which is definitely not so elegant as
>>>                the Perl one-liners):
>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/SimpleTagEditor.java 
>>> <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/SimpleTagEditor.java>
>>>                <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/SimpleTagEditor.java>
>>>
>>>                Corresponding patch for jdk and replacement log (~62k
>>>                of the tag changes):
>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch 
>>> <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
>>>                <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/replace.log 
>>> <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/replace.log>
>>>                <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/replace.log>
>>>                (sorry, I have to check the correctness of the patch
>>>                with specdiff yet,
>>>                so this is rather demo at the moment).
>>>
>>>                Don't know if these changes (cosmetic by nature) are
>>>                desired for now or
>>>                not. Moreover, probably some part of them should go to
>>>                another repos (e.g.,
>>>                awt, swing -> "client" instead of "dev").
>>>
>>>                Regards,
>>>                Alexander
>>>
>>>
>>>
>>>                ----- Исходное сообщение -----
>>>                От: alexander.v.stepanov at oracle.com 
>>> <mailto:alexander.v.stepanov at oracle.com>
>>>                <mailto:alexander.v.stepanov at oracle.com>
>>>                Кому: martinrb at google.com 
>>> <mailto:martinrb at google.com> <mailto:martinrb at google.com>
>>>                Копия: core-libs-dev at openjdk.java.net 
>>> <mailto:core-libs-dev at openjdk.java.net>
>>>                <mailto:core-libs-dev at openjdk.java.net>
>>>                Отправленные: Четверг, 24 Сентябрь 2015 г 16:06:56 GMT
>>>                +03:00 Москва,
>>>                Санкт-Петербург, Волгоград
>>>                Тема: Re: RFR [9] 8133651: replace some <tt> tags
>>>                (obsolete in html5) in
>>>                core-libs docs
>>>
>>>                Hello Martin,
>>>
>>>                Thank you for review and for the notes!
>>>
>>>                   > I'm biased of course, but I like the approach I
>>>                took with
>>>                blessed-modifier-order:
>>>                   > - make the change completely automated
>>>                   > - leave "human editing" for a separate change
>>>                   > - publish the code used to make the automated
>>>                change (in my case,
>>>                typically a perl one-liner)
>>>
>>>                Automated replacement has an obvious advantage: it is
>>>                fast and massive.
>>>                But there are some disadvantages at the same time
>>>                (just IMHO).
>>>
>>>                Using script it is quite easy to miss some not very
>>>                trivial cases, e.g.:
>>>                - remove unnecessary linebreaks, like
>>>                    * <tt>someCode
>>>                    * </tt>
>>>                (which would be better to replace with single-line
>>>                {@code someCode};
>>>                - joining of successive terms, like "<tt>ONE</tt>,
>>>                <tt>TWO</tt>,
>>>                <tt>THREE</tt>" -> "{@code ONE, TWO, THREE}";
>>>                - errors like extra or missing "<" or ">": *
>>>                <tt>Collection
>>>                <T></tt>", - there were a lot of them;
>>>                - some cases when <tt></tt> should be replaced with
>>>                <code></code>, not
>>>                {@code } (e.g. because of unicode characters inside of
>>>                code etc.);
>>>                - extra tags inside of <tt> or <code> which should be
>>>                moved outside of
>>>                {@code }, like <tt><i>someCode</i></tt> or
>>>                <tt><b>someCode</b></tt>;
>>>                - simple removing of needless tags, like "<tt>{@link
>>>                ...}</tt>" ->
>>>                "{@link ...}";
>>>                - replace HTML codes with symbols ('<', '>', '@', ...)
>>>                - etc.
>>>                - plus some other formatting changes and fixes for
>>>                misprints which would
>>>                be omitted during the automated replacement (and
>>>                wouldn't be done in
>>>                future manually because there is no motivation for
>>>                repeated processing).
>>>
>>>                So sometimes it may be difficult to say where is the
>>>                border between
>>>                "trivial" and "human-editing" cases (and the portion
>>>                of "non-trivial
>>>                cases" is definitely not minor); moreover, even the
>>>                automated
>>>                replacement requires the subsequent careful review
>>>                before publishing of
>>>                webrev (as well as by reviewers who probably wouldn't
>>>                be happy to review
>>>                hundreds of files at the same time) and iterative
>>>                checks/corrections.
>>>                specdiff is very useful for this task but also cannot
>>>                fully cover the
>>>                diffs (as some changes are situated in the internal
>>>                com/... sun/...
>>>                packages).
>>>
>>>                Moreover, I'm sure that some reviewers would be
>>>                annoyed with the fact
>>>                that some (quite simple) changes were postponed
>>>                because they are "not
>>>                too trivial to be fixed just now" (because they will
>>>                suspect they would
>>>                be postponed forever). So the patch creator would
>>>                (probably) receive
>>>                some advices during the review like "please fix also
>>>                fix this and that"
>>>                (which is normal, of course).
>>>
>>>                So my preference was to make the changes package by
>>>                package (in some
>>>                reasonable amount of files) not postponing part of the
>>>                changes for the
>>>                future (sorry for these boring repeating review
>>>                requests). Please note
>>>                that all the above mentioned is *rather explanation of
>>>                my motivation
>>>                than objection* :) (and of course I used some text
>>>                editor replace
>>>                automation which is surely not so advanced as Perl).
>>>
>>>                   > It's probably correct, but I would have left it
>>>                out of this change
>>>                Yes, I see. Reverted (please update the web page):
>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/index.html 
>>> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/index.html>
>>>                <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/index.html>
>>>
>>>                Thanks,
>>>                Alexander
>>>
>>>                P.S. The <tt> replacement job is mostly (I guess,
>>>                ~80%) complete. But
>>>                probably this approach should be used if some similar
>>>                replacement task
>>>                for, e.g., <code></code> tags would be planned in
>>>                future (there are
>>>                thousands of them).
>>>
>>>
>>>                On 9/24/2015 6:10 AM, Martin Buchholz wrote:
>>>
>>>
>>>                    On Sat, Sep 19, 2015 at 6:58 AM, Alexander Stepanov
>>>                    <alexander.v.stepanov at oracle.com 
>>> <mailto:alexander.v.stepanov at oracle.com>
>>>                    <mailto:alexander.v.stepanov at oracle.com>
>>>                    <mailto:alexander.v.stepanov at oracle.com
>>>                    <mailto:alexander.v.stepanov at oracle.com>>> wrote:
>>>
>>>                          Hello,
>>>
>>>                          Could you please review the following fix
>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/ 
>>> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>>>                    <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>>>                                             <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>>> http://cr.openjdk.java.net/~avstepan/8133651/jaxws.00/index.html 
>>> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html>
>>>                    <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html>
>>>                                             <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html
>>>
>>>
>>>                          for
>>> https://bugs.openjdk.java.net/browse/JDK-8133651
>>>
>>>                          Just another portion of deprecated <tt> (and
>>>                    <xmp>) tags replaced
>>>                          with {@code }. Some misprints were also fixed.
>>>
>>>
>>>                    I'm biased of course, but I like the approach I
>>>                    took with
>>>                    blessed-modifier-order:
>>>                    - make the change completely automated
>>>                    - leave "human editing" for a separate change
>>>                    - publish the code used to make the automated
>>>                    change (in my case,
>>>                    typically a perl one-liner)
>>>
>>>
>>>                          The following (expected) changes were
>>>                    detected by specdiff:
>>>                          - removed needless dashes in java.util.Locale,
>>>                          - removed needless curly brace in
>>>                    xml.bind.annotation.XmlElementRef
>>>
>>>
>>>                    I would do a separate automated "removed needless
>>>                    dashes" changeset.
>>>
>>>
>>>                          Please let me know if the following changes
>>>                    are desirable or not:
>>>
>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html 
>>> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html>
>>>                    <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html>
>>>                          <
>>> http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>>>
>>>
>>>
>>>
>>>                    This is an actual change to the behavior of this
>>>                    code - the
>>>                    maintainers of jconsole need to approve it. It's
>>>                    probably correct,
>>>                    but I would have left it out of this change. If
>>>                    you remove it, then I
>>>                    approve this change.
>>>
>>>
>>>
>>>
>>
>
> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> <http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif>
> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance 
> Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>
>
>




More information about the core-libs-dev mailing list