P.S.: RFR [9] 8133651: automated replacing of old-style tags in docs

Roger Riggs Roger.Riggs at Oracle.com
Thu Oct 1 13:00:59 UTC 2015


+1 for manageable sized and per repo code-reviews.

(unless someone has a tool that checks the patch to confirm that the 
only difference between
the old and the new is the same small number of substitutions; another 
quick program perhaps...).

On 10/1/2015 8:40 AM, Lance Andersen wrote:
> Hi Alexander,
>
> Personally, I would find it easier to review if the changes were sent out by category/module vs one large patch as it is easier to miss things.
>
> Also for technologies such as jaxws, these will need to go to the external workspace at the same time for the technology so these should definitely be kept separate.
>
> Best
> Lance
> On Oct 1, 2015, at 7:31 AM, Alexander Stepanov <alexander.v.stepanov at oracle.com> wrote:
>
>>
>> So if you don't have objections, I'll delay for a several days and then publish a final RFR (probably containing changes in some other repos like jaxws, corba or jaxp) which would be more formal (containing bug # and the final specdiff report).
>>
>> Thanks again,
>> Alexander
>>
>>
>> On 10/1/2015 9:54 AM, Martin Buchholz wrote:
>>> Hi s'marks,
>>> You probably don't need to absolutify paths.
>>> And you can easily handle multiple args.
>>>
>>> (just for fun!)
>>> Checks for javadoc comment; handles popular html entities; handles multiple lines; handles both tt and code:
>>>
>>> #!/bin/bash
>>> find "$@" -name '*.java' | \
>>>   xargs -r perl -p0777i -e \
>>>   'do {} while s~^ *\*.*\K<(tt|code)>((?:[^<>{}\&\@]|&(?:lt|gt|amp);)*)</\1>~$_=$2; s/</</g; s/>/>/g; s/&/&/g; "{\@code $_}"~mgie'
>>>
>>>
>>> On Wed, Sep 30, 2015 at 6:16 PM, Stuart Marks <stuart.marks at oracle.com <mailto:stuart.marks at oracle.com>> wrote:
>>>
>>>     Hi Alexander, Martin,
>>>
>>>     The challenge of Perl file slurping and Emacs one-liners was too
>>>     much to bear.
>>>
>>>     This is Java, so one-liners are hardly possible. Still, there are
>>>     a bunch of improvements that can be made to the Java version. (OK,
>>>     and I'm showing off a bit.)
>>>
>>>     Take a look at this:
>>>
>>>     http://cr.openjdk.java.net/~smarks/misc/SimpleTagEditorSmarks1.java <http://cr.openjdk.java.net/%7Esmarks/misc/SimpleTagEditorSmarks1.java>
>>>
>>>     I haven't studied the output exhaustively, but it seems to do a
>>>     reasonably good job for the common cases. I ran it over java.lang
>>>     and I noticed a few cases where there is markup embedded within
>>>     <code></code> text, which should be looked at more closely.
>>>
>>>     I don't particularly care if you use my version, but there are
>>>     some techniques that I'd strongly recommend that you consider
>>>     using in any such tool. In particular:
>>>
>>>      - Pattern.DOTALL to do multi-line matches
>>>      - Pattern.CASE_INSENSITIVE
>>>      - try-with-resources to ensure that files are closed properly
>>>      - NIO instead of old java.io <http://java.io> APIs, particularly
>>>     Files.walk() and streams
>>>      - use Scanner to deal with input file buffering
>>>      - Scanner's stream support (I recently added this to JDK 9)
>>>
>>>     Enjoy,
>>>
>>>     s'marks
>>>
>>>
>>>
>>>     On 9/29/15 2:23 PM, Martin Buchholz wrote:
>>>
>>>         Hi Alexander,
>>>
>>>         your change looks good.  It's OK to have manual corrections
>>>         for automated
>>>         mega-changes like this, as long as they all revert changes.
>>>
>>>         Random comments:
>>>
>>>         Should you publish your specdiff?  I guess not - it would be
>>>         empty!
>>>
>>>                      while((s = br.readLine()) != null) {
>>>
>>>         by matching only one line at a time, you lose the ability to make
>>>         replacements that span lines.  Perlers like to "slurp" in the
>>>         entire file
>>>         as a single string.
>>>
>>>                  s = s.replace( "<CODE>", tag1);
>>>                  s = s.replace( "<Code>", tag1);
>>>                  s = s.replace("</CODE>", tag2);
>>>                  s = s.replace("</Code>", tag2);
>>>
>>>         Why not use case-insensitive regex?
>>>
>>>         Here's an emacs-lisp one-liner I've been known to use:
>>>
>>>         (defun tt-code ()
>>>            (interactive)
>>>            (query-replace-regexp
>>>         "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)</\\1>" "{@code
>>>         \\2}"))
>>>
>>>         With more work, one can automate transformation of embedded
>>>         things like <
>>>
>>>         But of course, it's not even possible to transform ALL uses of
>>>         <code> to
>>>         {@code, if there was imaginative use of nested html tags.
>>>
>>>
>>>         On Tue, Sep 29, 2015 at 3:21 AM, Alexander Stepanov <
>>>         alexander.v.stepanov at oracle.com
>>>         <mailto:alexander.v.stepanov at oracle.com>> wrote:
>>>
>>>             Updated: a few manual corrections were made (as @linkplain
>>>             tags displays
>>>             nested {@code } literally):
>>>             http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
>>>             -checked with specdiff (which of course does not cover
>>>             documentation for
>>>             internal packages), no unexpected diffs detected.
>>>
>>>             Regards,
>>>             Alexander
>>>
>>>
>>>             On 9/27/2015 4:52 PM, Alexander Stepanov wrote:
>>>
>>>                 Hello Martin,
>>>
>>>                 Here is some simple app. to replace <code></code> tags
>>>                 with a new-style
>>>                 {@code } one (which is definitely not so elegant as
>>>                 the Perl one-liners):
>>>                 http://cr.openjdk.java.net/~avstepan/tmp/codeTags/SimpleTagEditor.java
>>>                 <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/SimpleTagEditor.java>
>>>
>>>                 Corresponding patch for jdk and replacement log (~62k
>>>                 of the tag changes):
>>>                 http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
>>>                 <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
>>>                 http://cr.openjdk.java.net/~avstepan/tmp/codeTags/replace.log
>>>                 <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/replace.log>
>>>                 (sorry, I have to check the correctness of the patch
>>>                 with specdiff yet,
>>>                 so this is rather demo at the moment).
>>>
>>>                 Don't know if these changes (cosmetic by nature) are
>>>                 desired for now or
>>>                 not. Moreover, probably some part of them should go to
>>>                 another repos (e.g.,
>>>                 awt, swing -> "client" instead of "dev").
>>>
>>>                 Regards,
>>>                 Alexander
>>>
>>>
>>>
>>>                 ----- Исходное сообщение -----
>>>                 От: alexander.v.stepanov at oracle.com
>>>                 <mailto:alexander.v.stepanov at oracle.com>
>>>                 Кому: martinrb at google.com <mailto:martinrb at google.com>
>>>                 Копия: core-libs-dev at openjdk.java.net
>>>                 <mailto:core-libs-dev at openjdk.java.net>
>>>                 Отправленные: Четверг, 24 Сентябрь 2015 г 16:06:56 GMT
>>>                 +03:00 Москва,
>>>                 Санкт-Петербург, Волгоград
>>>                 Тема: Re: RFR [9] 8133651: replace some <tt> tags
>>>                 (obsolete in html5) in
>>>                 core-libs docs
>>>
>>>                 Hello Martin,
>>>
>>>                 Thank you for review and for the notes!
>>>
>>>                    > I'm biased of course, but I like the approach I
>>>                 took with
>>>                 blessed-modifier-order:
>>>                    > - make the change completely automated
>>>                    > - leave "human editing" for a separate change
>>>                    > - publish the code used to make the automated
>>>                 change (in my case,
>>>                 typically a perl one-liner)
>>>
>>>                 Automated replacement has an obvious advantage: it is
>>>                 fast and massive.
>>>                 But there are some disadvantages at the same time
>>>                 (just IMHO).
>>>
>>>                 Using script it is quite easy to miss some not very
>>>                 trivial cases, e.g.:
>>>                 - remove unnecessary linebreaks, like
>>>                     * <tt>someCode
>>>                     * </tt>
>>>                 (which would be better to replace with single-line
>>>                 {@code someCode};
>>>                 - joining of successive terms, like "<tt>ONE</tt>,
>>>                 <tt>TWO</tt>,
>>>                 <tt>THREE</tt>" -> "{@code ONE, TWO, THREE}";
>>>                 - errors like extra or missing "<" or ">": *
>>>                 <tt>Collection
>>>                 <T></tt>", - there were a lot of them;
>>>                 - some cases when <tt></tt> should be replaced with
>>>                 <code></code>, not
>>>                 {@code } (e.g. because of unicode characters inside of
>>>                 code etc.);
>>>                 - extra tags inside of <tt> or <code> which should be
>>>                 moved outside of
>>>                 {@code }, like <tt><i>someCode</i></tt> or
>>>                 <tt><b>someCode</b></tt>;
>>>                 - simple removing of needless tags, like "<tt>{@link
>>>                 ...}</tt>" ->
>>>                 "{@link ...}";
>>>                 - replace HTML codes with symbols ('<', '>', '@', ...)
>>>                 - etc.
>>>                 - plus some other formatting changes and fixes for
>>>                 misprints which would
>>>                 be omitted during the automated replacement (and
>>>                 wouldn't be done in
>>>                 future manually because there is no motivation for
>>>                 repeated processing).
>>>
>>>                 So sometimes it may be difficult to say where is the
>>>                 border between
>>>                 "trivial" and "human-editing" cases (and the portion
>>>                 of "non-trivial
>>>                 cases" is definitely not minor); moreover, even the
>>>                 automated
>>>                 replacement requires the subsequent careful review
>>>                 before publishing of
>>>                 webrev (as well as by reviewers who probably wouldn't
>>>                 be happy to review
>>>                 hundreds of files at the same time) and iterative
>>>                 checks/corrections.
>>>                 specdiff is very useful for this task but also cannot
>>>                 fully cover the
>>>                 diffs (as some changes are situated in the internal
>>>                 com/... sun/...
>>>                 packages).
>>>
>>>                 Moreover, I'm sure that some reviewers would be
>>>                 annoyed with the fact
>>>                 that some (quite simple) changes were postponed
>>>                 because they are "not
>>>                 too trivial to be fixed just now" (because they will
>>>                 suspect they would
>>>                 be postponed forever). So the patch creator would
>>>                 (probably) receive
>>>                 some advices during the review like "please fix also
>>>                 fix this and that"
>>>                 (which is normal, of course).
>>>
>>>                 So my preference was to make the changes package by
>>>                 package (in some
>>>                 reasonable amount of files) not postponing part of the
>>>                 changes for the
>>>                 future (sorry for these boring repeating review
>>>                 requests). Please note
>>>                 that all the above mentioned is *rather explanation of
>>>                 my motivation
>>>                 than objection* :) (and of course I used some text
>>>                 editor replace
>>>                 automation which is surely not so advanced as Perl).
>>>
>>>                    > It's probably correct, but I would have left it
>>>                 out of this change
>>>                 Yes, I see. Reverted (please update the web page):
>>>                 http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/index.html
>>>                 <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/index.html>
>>>
>>>                 Thanks,
>>>                 Alexander
>>>
>>>                 P.S. The <tt> replacement job is mostly (I guess,
>>>                 ~80%) complete. But
>>>                 probably this approach should be used if some similar
>>>                 replacement task
>>>                 for, e.g., <code></code> tags would be planned in
>>>                 future (there are
>>>                 thousands of them).
>>>
>>>
>>>                 On 9/24/2015 6:10 AM, Martin Buchholz wrote:
>>>
>>>
>>>                     On Sat, Sep 19, 2015 at 6:58 AM, Alexander Stepanov
>>>                     <alexander.v.stepanov at oracle.com
>>>                     <mailto:alexander.v.stepanov at oracle.com>
>>>                     <mailto:alexander.v.stepanov at oracle.com
>>>                     <mailto:alexander.v.stepanov at oracle.com>>> wrote:
>>>
>>>                           Hello,
>>>
>>>                           Could you please review the following fix
>>>                     http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/
>>>                     <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>>>                                              <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>>>                     http://cr.openjdk.java.net/~avstepan/8133651/jaxws.00/index.html
>>>                     <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html>
>>>                                              <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html
>>>
>>>
>>>                           for
>>>                     https://bugs.openjdk.java.net/browse/JDK-8133651
>>>
>>>                           Just another portion of deprecated <tt> (and
>>>                     <xmp>) tags replaced
>>>                           with {@code }. Some misprints were also fixed.
>>>
>>>
>>>                     I'm biased of course, but I like the approach I
>>>                     took with
>>>                     blessed-modifier-order:
>>>                     - make the change completely automated
>>>                     - leave "human editing" for a separate change
>>>                     - publish the code used to make the automated
>>>                     change (in my case,
>>>                     typically a perl one-liner)
>>>
>>>
>>>                           The following (expected) changes were
>>>                     detected by specdiff:
>>>                           - removed needless dashes in java.util.Locale,
>>>                           - removed needless curly brace in
>>>                     xml.bind.annotation.XmlElementRef
>>>
>>>
>>>                     I would do a separate automated "removed needless
>>>                     dashes" changeset.
>>>
>>>
>>>                           Please let me know if the following changes
>>>                     are desirable or not:
>>>
>>>                     http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>>>                     <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html>
>>>                           <
>>>                     http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>>>
>>>
>>>
>>>
>>>                     This is an actual change to the behavior of this
>>>                     code - the
>>>                     maintainers of jconsole need to approve it. It's
>>>                     probably correct,
>>>                     but I would have left it out of this change. If
>>>                     you remove it, then I
>>>                     approve this change.
>>>
>>>
>>>
>>>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>




More information about the core-libs-dev mailing list