P.S.: RFR [9] 8133651: automated replacing of old-style tags in docs

Alexander Stepanov alexander.v.stepanov at oracle.com
Thu Oct 1 11:31:39 UTC 2015


Hello Martin, Stuart,

Thank you for the notes,

Yes, the initial utility is quite ugly, I just tried to prepare it as 
quickly as possible hoping that it covers the majority of "trivial" 
replace cases. Yes, it does not process multi-line <code> inclusions.

 >  s = s.replace( "<CODE>", tag1);
 >  s = s.replace( "<Code>", tag1);
 >  s = s.replace("</CODE>", tag2);
 >  s = s.replace("</Code>", tag2);

- replaced with "s = ln.replaceAll("(?i)<code>", 
"<code>").replaceAll("(?i)</code>", "</code>");"

Unfortunately my Perl/lisp knowledge are zero :)

 > Should you publish your specdiff?  I guess not - it would be empty!
For now it contains a single fixed misprint diff, but there are a few 
another misprints at the moment which I'd like to include in the patch 
as well.

So if you don't have objections, I'll delay for a several days and then 
publish a final RFR (probably containing changes in some other repos 
like jaxws, corba or jaxp) which would be more formal (containing bug # 
and the final specdiff report).

Thanks again,
Alexander


On 10/1/2015 9:54 AM, Martin Buchholz wrote:
> Hi s'marks,
> You probably don't need to absolutify paths.
> And you can easily handle multiple args.
>
> (just for fun!)
> Checks for javadoc comment; handles popular html entities; handles 
> multiple lines; handles both tt and code:
>
> #!/bin/bash
> find "$@" -name '*.java' | \
>   xargs -r perl -p0777i -e \
>   'do {} while s~^ 
> *\*.*\K<(tt|code)>((?:[^<>{}\&\@]|&(?:lt|gt|amp);)*)</\1>~$_=$2; 
> s/</</g; s/>/>/g; s/&/&/g; "{\@code $_}"~mgie'
>
>
> On Wed, Sep 30, 2015 at 6:16 PM, Stuart Marks <stuart.marks at oracle.com 
> <mailto:stuart.marks at oracle.com>> wrote:
>
>     Hi Alexander, Martin,
>
>     The challenge of Perl file slurping and Emacs one-liners was too
>     much to bear.
>
>     This is Java, so one-liners are hardly possible. Still, there are
>     a bunch of improvements that can be made to the Java version. (OK,
>     and I'm showing off a bit.)
>
>     Take a look at this:
>
>     http://cr.openjdk.java.net/~smarks/misc/SimpleTagEditorSmarks1.java <http://cr.openjdk.java.net/%7Esmarks/misc/SimpleTagEditorSmarks1.java>
>
>     I haven't studied the output exhaustively, but it seems to do a
>     reasonably good job for the common cases. I ran it over java.lang
>     and I noticed a few cases where there is markup embedded within
>     <code></code> text, which should be looked at more closely.
>
>     I don't particularly care if you use my version, but there are
>     some techniques that I'd strongly recommend that you consider
>     using in any such tool. In particular:
>
>      - Pattern.DOTALL to do multi-line matches
>      - Pattern.CASE_INSENSITIVE
>      - try-with-resources to ensure that files are closed properly
>      - NIO instead of old java.io <http://java.io> APIs, particularly
>     Files.walk() and streams
>      - use Scanner to deal with input file buffering
>      - Scanner's stream support (I recently added this to JDK 9)
>
>     Enjoy,
>
>     s'marks
>
>
>
>     On 9/29/15 2:23 PM, Martin Buchholz wrote:
>
>         Hi Alexander,
>
>         your change looks good.  It's OK to have manual corrections
>         for automated
>         mega-changes like this, as long as they all revert changes.
>
>         Random comments:
>
>         Should you publish your specdiff?  I guess not - it would be
>         empty!
>
>                      while((s = br.readLine()) != null) {
>
>         by matching only one line at a time, you lose the ability to make
>         replacements that span lines.  Perlers like to "slurp" in the
>         entire file
>         as a single string.
>
>                  s = s.replace( "<CODE>", tag1);
>                  s = s.replace( "<Code>", tag1);
>                  s = s.replace("</CODE>", tag2);
>                  s = s.replace("</Code>", tag2);
>
>         Why not use case-insensitive regex?
>
>         Here's an emacs-lisp one-liner I've been known to use:
>
>         (defun tt-code ()
>            (interactive)
>            (query-replace-regexp
>         "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)</\\1>" "{@code
>         \\2}"))
>
>         With more work, one can automate transformation of embedded
>         things like <
>
>         But of course, it's not even possible to transform ALL uses of
>         <code> to
>         {@code, if there was imaginative use of nested html tags.
>
>
>         On Tue, Sep 29, 2015 at 3:21 AM, Alexander Stepanov <
>         alexander.v.stepanov at oracle.com
>         <mailto:alexander.v.stepanov at oracle.com>> wrote:
>
>             Updated: a few manual corrections were made (as @linkplain
>             tags displays
>             nested {@code } literally):
>             http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
>             -checked with specdiff (which of course does not cover
>             documentation for
>             internal packages), no unexpected diffs detected.
>
>             Regards,
>             Alexander
>
>
>             On 9/27/2015 4:52 PM, Alexander Stepanov wrote:
>
>                 Hello Martin,
>
>                 Here is some simple app. to replace <code></code> tags
>                 with a new-style
>                 {@code } one (which is definitely not so elegant as
>                 the Perl one-liners):
>                 http://cr.openjdk.java.net/~avstepan/tmp/codeTags/SimpleTagEditor.java
>                 <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/SimpleTagEditor.java>
>
>                 Corresponding patch for jdk and replacement log (~62k
>                 of the tag changes):
>                 http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
>                 <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
>                 http://cr.openjdk.java.net/~avstepan/tmp/codeTags/replace.log
>                 <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/replace.log>
>                 (sorry, I have to check the correctness of the patch
>                 with specdiff yet,
>                 so this is rather demo at the moment).
>
>                 Don't know if these changes (cosmetic by nature) are
>                 desired for now or
>                 not. Moreover, probably some part of them should go to
>                 another repos (e.g.,
>                 awt, swing -> "client" instead of "dev").
>
>                 Regards,
>                 Alexander
>
>
>
>                 ----- Исходное сообщение -----
>                 От: alexander.v.stepanov at oracle.com
>                 <mailto:alexander.v.stepanov at oracle.com>
>                 Кому: martinrb at google.com <mailto:martinrb at google.com>
>                 Копия: core-libs-dev at openjdk.java.net
>                 <mailto:core-libs-dev at openjdk.java.net>
>                 Отправленные: Четверг, 24 Сентябрь 2015 г 16:06:56 GMT
>                 +03:00 Москва,
>                 Санкт-Петербург, Волгоград
>                 Тема: Re: RFR [9] 8133651: replace some <tt> tags
>                 (obsolete in html5) in
>                 core-libs docs
>
>                 Hello Martin,
>
>                 Thank you for review and for the notes!
>
>                    > I'm biased of course, but I like the approach I
>                 took with
>                 blessed-modifier-order:
>                    > - make the change completely automated
>                    > - leave "human editing" for a separate change
>                    > - publish the code used to make the automated
>                 change (in my case,
>                 typically a perl one-liner)
>
>                 Automated replacement has an obvious advantage: it is
>                 fast and massive.
>                 But there are some disadvantages at the same time
>                 (just IMHO).
>
>                 Using script it is quite easy to miss some not very
>                 trivial cases, e.g.:
>                 - remove unnecessary linebreaks, like
>                     * <tt>someCode
>                     * </tt>
>                 (which would be better to replace with single-line
>                 {@code someCode};
>                 - joining of successive terms, like "<tt>ONE</tt>,
>                 <tt>TWO</tt>,
>                 <tt>THREE</tt>" -> "{@code ONE, TWO, THREE}";
>                 - errors like extra or missing "<" or ">": *
>                 <tt>Collection
>                 <T></tt>", - there were a lot of them;
>                 - some cases when <tt></tt> should be replaced with
>                 <code></code>, not
>                 {@code } (e.g. because of unicode characters inside of
>                 code etc.);
>                 - extra tags inside of <tt> or <code> which should be
>                 moved outside of
>                 {@code }, like <tt><i>someCode</i></tt> or
>                 <tt><b>someCode</b></tt>;
>                 - simple removing of needless tags, like "<tt>{@link
>                 ...}</tt>" ->
>                 "{@link ...}";
>                 - replace HTML codes with symbols ('<', '>', '@', ...)
>                 - etc.
>                 - plus some other formatting changes and fixes for
>                 misprints which would
>                 be omitted during the automated replacement (and
>                 wouldn't be done in
>                 future manually because there is no motivation for
>                 repeated processing).
>
>                 So sometimes it may be difficult to say where is the
>                 border between
>                 "trivial" and "human-editing" cases (and the portion
>                 of "non-trivial
>                 cases" is definitely not minor); moreover, even the
>                 automated
>                 replacement requires the subsequent careful review
>                 before publishing of
>                 webrev (as well as by reviewers who probably wouldn't
>                 be happy to review
>                 hundreds of files at the same time) and iterative
>                 checks/corrections.
>                 specdiff is very useful for this task but also cannot
>                 fully cover the
>                 diffs (as some changes are situated in the internal
>                 com/... sun/...
>                 packages).
>
>                 Moreover, I'm sure that some reviewers would be
>                 annoyed with the fact
>                 that some (quite simple) changes were postponed
>                 because they are "not
>                 too trivial to be fixed just now" (because they will
>                 suspect they would
>                 be postponed forever). So the patch creator would
>                 (probably) receive
>                 some advices during the review like "please fix also
>                 fix this and that"
>                 (which is normal, of course).
>
>                 So my preference was to make the changes package by
>                 package (in some
>                 reasonable amount of files) not postponing part of the
>                 changes for the
>                 future (sorry for these boring repeating review
>                 requests). Please note
>                 that all the above mentioned is *rather explanation of
>                 my motivation
>                 than objection* :) (and of course I used some text
>                 editor replace
>                 automation which is surely not so advanced as Perl).
>
>                    > It's probably correct, but I would have left it
>                 out of this change
>                 Yes, I see. Reverted (please update the web page):
>                 http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/index.html
>                 <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/index.html>
>
>                 Thanks,
>                 Alexander
>
>                 P.S. The <tt> replacement job is mostly (I guess,
>                 ~80%) complete. But
>                 probably this approach should be used if some similar
>                 replacement task
>                 for, e.g., <code></code> tags would be planned in
>                 future (there are
>                 thousands of them).
>
>
>                 On 9/24/2015 6:10 AM, Martin Buchholz wrote:
>
>
>                     On Sat, Sep 19, 2015 at 6:58 AM, Alexander Stepanov
>                     <alexander.v.stepanov at oracle.com
>                     <mailto:alexander.v.stepanov at oracle.com>
>                     <mailto:alexander.v.stepanov at oracle.com
>                     <mailto:alexander.v.stepanov at oracle.com>>> wrote:
>
>                           Hello,
>
>                           Could you please review the following fix
>                     http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/
>                     <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>                          
>                     <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>                     http://cr.openjdk.java.net/~avstepan/8133651/jaxws.00/index.html
>                     <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html>
>                          
>                     <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html
>
>
>                           for
>                     https://bugs.openjdk.java.net/browse/JDK-8133651
>
>                           Just another portion of deprecated <tt> (and
>                     <xmp>) tags replaced
>                           with {@code }. Some misprints were also fixed.
>
>
>                     I'm biased of course, but I like the approach I
>                     took with
>                     blessed-modifier-order:
>                     - make the change completely automated
>                     - leave "human editing" for a separate change
>                     - publish the code used to make the automated
>                     change (in my case,
>                     typically a perl one-liner)
>
>
>                           The following (expected) changes were
>                     detected by specdiff:
>                           - removed needless dashes in java.util.Locale,
>                           - removed needless curly brace in
>                     xml.bind.annotation.XmlElementRef
>
>
>                     I would do a separate automated "removed needless
>                     dashes" changeset.
>
>
>                           Please let me know if the following changes
>                     are desirable or not:
>
>                     http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>                     <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html>
>                           <
>                     http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>
>
>
>
>                     This is an actual change to the behavior of this
>                     code - the
>                     maintainers of jconsole need to approve it. It's
>                     probably correct,
>                     but I would have left it out of this change. If
>                     you remove it, then I
>                     approve this change.
>
>
>
>




More information about the core-libs-dev mailing list