P.S.: RFR [9] 8133651: automated replacing of old-style tags in docs
Alexander Stepanov
alexander.v.stepanov at oracle.com
Thu Oct 1 11:31:39 UTC 2015
Hello Martin, Stuart,
Thank you for the notes,
Yes, the initial utility is quite ugly, I just tried to prepare it as
quickly as possible hoping that it covers the majority of "trivial"
replace cases. Yes, it does not process multi-line <code> inclusions.
> s = s.replace( "<CODE>", tag1);
> s = s.replace( "<Code>", tag1);
> s = s.replace("</CODE>", tag2);
> s = s.replace("</Code>", tag2);
- replaced with "s = ln.replaceAll("(?i)<code>",
"<code>").replaceAll("(?i)</code>", "</code>");"
Unfortunately my Perl/lisp knowledge are zero :)
> Should you publish your specdiff? I guess not - it would be empty!
For now it contains a single fixed misprint diff, but there are a few
another misprints at the moment which I'd like to include in the patch
as well.
So if you don't have objections, I'll delay for a several days and then
publish a final RFR (probably containing changes in some other repos
like jaxws, corba or jaxp) which would be more formal (containing bug #
and the final specdiff report).
Thanks again,
Alexander
On 10/1/2015 9:54 AM, Martin Buchholz wrote:
> Hi s'marks,
> You probably don't need to absolutify paths.
> And you can easily handle multiple args.
>
> (just for fun!)
> Checks for javadoc comment; handles popular html entities; handles
> multiple lines; handles both tt and code:
>
> #!/bin/bash
> find "$@" -name '*.java' | \
> xargs -r perl -p0777i -e \
> 'do {} while s~^
> *\*.*\K<(tt|code)>((?:[^<>{}\&\@]|&(?:lt|gt|amp);)*)</\1>~$_=$2;
> s/</</g; s/>/>/g; s/&/&/g; "{\@code $_}"~mgie'
>
>
> On Wed, Sep 30, 2015 at 6:16 PM, Stuart Marks <stuart.marks at oracle.com
> <mailto:stuart.marks at oracle.com>> wrote:
>
> Hi Alexander, Martin,
>
> The challenge of Perl file slurping and Emacs one-liners was too
> much to bear.
>
> This is Java, so one-liners are hardly possible. Still, there are
> a bunch of improvements that can be made to the Java version. (OK,
> and I'm showing off a bit.)
>
> Take a look at this:
>
> http://cr.openjdk.java.net/~smarks/misc/SimpleTagEditorSmarks1.java <http://cr.openjdk.java.net/%7Esmarks/misc/SimpleTagEditorSmarks1.java>
>
> I haven't studied the output exhaustively, but it seems to do a
> reasonably good job for the common cases. I ran it over java.lang
> and I noticed a few cases where there is markup embedded within
> <code></code> text, which should be looked at more closely.
>
> I don't particularly care if you use my version, but there are
> some techniques that I'd strongly recommend that you consider
> using in any such tool. In particular:
>
> - Pattern.DOTALL to do multi-line matches
> - Pattern.CASE_INSENSITIVE
> - try-with-resources to ensure that files are closed properly
> - NIO instead of old java.io <http://java.io> APIs, particularly
> Files.walk() and streams
> - use Scanner to deal with input file buffering
> - Scanner's stream support (I recently added this to JDK 9)
>
> Enjoy,
>
> s'marks
>
>
>
> On 9/29/15 2:23 PM, Martin Buchholz wrote:
>
> Hi Alexander,
>
> your change looks good. It's OK to have manual corrections
> for automated
> mega-changes like this, as long as they all revert changes.
>
> Random comments:
>
> Should you publish your specdiff? I guess not - it would be
> empty!
>
> while((s = br.readLine()) != null) {
>
> by matching only one line at a time, you lose the ability to make
> replacements that span lines. Perlers like to "slurp" in the
> entire file
> as a single string.
>
> s = s.replace( "<CODE>", tag1);
> s = s.replace( "<Code>", tag1);
> s = s.replace("</CODE>", tag2);
> s = s.replace("</Code>", tag2);
>
> Why not use case-insensitive regex?
>
> Here's an emacs-lisp one-liner I've been known to use:
>
> (defun tt-code ()
> (interactive)
> (query-replace-regexp
> "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)</\\1>" "{@code
> \\2}"))
>
> With more work, one can automate transformation of embedded
> things like <
>
> But of course, it's not even possible to transform ALL uses of
> <code> to
> {@code, if there was imaginative use of nested html tags.
>
>
> On Tue, Sep 29, 2015 at 3:21 AM, Alexander Stepanov <
> alexander.v.stepanov at oracle.com
> <mailto:alexander.v.stepanov at oracle.com>> wrote:
>
> Updated: a few manual corrections were made (as @linkplain
> tags displays
> nested {@code } literally):
> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
> -checked with specdiff (which of course does not cover
> documentation for
> internal packages), no unexpected diffs detected.
>
> Regards,
> Alexander
>
>
> On 9/27/2015 4:52 PM, Alexander Stepanov wrote:
>
> Hello Martin,
>
> Here is some simple app. to replace <code></code> tags
> with a new-style
> {@code } one (which is definitely not so elegant as
> the Perl one-liners):
> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/SimpleTagEditor.java
> <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/SimpleTagEditor.java>
>
> Corresponding patch for jdk and replacement log (~62k
> of the tag changes):
> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
> <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/replace.log
> <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/replace.log>
> (sorry, I have to check the correctness of the patch
> with specdiff yet,
> so this is rather demo at the moment).
>
> Don't know if these changes (cosmetic by nature) are
> desired for now or
> not. Moreover, probably some part of them should go to
> another repos (e.g.,
> awt, swing -> "client" instead of "dev").
>
> Regards,
> Alexander
>
>
>
> ----- Исходное сообщение -----
> От: alexander.v.stepanov at oracle.com
> <mailto:alexander.v.stepanov at oracle.com>
> Кому: martinrb at google.com <mailto:martinrb at google.com>
> Копия: core-libs-dev at openjdk.java.net
> <mailto:core-libs-dev at openjdk.java.net>
> Отправленные: Четверг, 24 Сентябрь 2015 г 16:06:56 GMT
> +03:00 Москва,
> Санкт-Петербург, Волгоград
> Тема: Re: RFR [9] 8133651: replace some <tt> tags
> (obsolete in html5) in
> core-libs docs
>
> Hello Martin,
>
> Thank you for review and for the notes!
>
> > I'm biased of course, but I like the approach I
> took with
> blessed-modifier-order:
> > - make the change completely automated
> > - leave "human editing" for a separate change
> > - publish the code used to make the automated
> change (in my case,
> typically a perl one-liner)
>
> Automated replacement has an obvious advantage: it is
> fast and massive.
> But there are some disadvantages at the same time
> (just IMHO).
>
> Using script it is quite easy to miss some not very
> trivial cases, e.g.:
> - remove unnecessary linebreaks, like
> * <tt>someCode
> * </tt>
> (which would be better to replace with single-line
> {@code someCode};
> - joining of successive terms, like "<tt>ONE</tt>,
> <tt>TWO</tt>,
> <tt>THREE</tt>" -> "{@code ONE, TWO, THREE}";
> - errors like extra or missing "<" or ">": *
> <tt>Collection
> <T></tt>", - there were a lot of them;
> - some cases when <tt></tt> should be replaced with
> <code></code>, not
> {@code } (e.g. because of unicode characters inside of
> code etc.);
> - extra tags inside of <tt> or <code> which should be
> moved outside of
> {@code }, like <tt><i>someCode</i></tt> or
> <tt><b>someCode</b></tt>;
> - simple removing of needless tags, like "<tt>{@link
> ...}</tt>" ->
> "{@link ...}";
> - replace HTML codes with symbols ('<', '>', '@', ...)
> - etc.
> - plus some other formatting changes and fixes for
> misprints which would
> be omitted during the automated replacement (and
> wouldn't be done in
> future manually because there is no motivation for
> repeated processing).
>
> So sometimes it may be difficult to say where is the
> border between
> "trivial" and "human-editing" cases (and the portion
> of "non-trivial
> cases" is definitely not minor); moreover, even the
> automated
> replacement requires the subsequent careful review
> before publishing of
> webrev (as well as by reviewers who probably wouldn't
> be happy to review
> hundreds of files at the same time) and iterative
> checks/corrections.
> specdiff is very useful for this task but also cannot
> fully cover the
> diffs (as some changes are situated in the internal
> com/... sun/...
> packages).
>
> Moreover, I'm sure that some reviewers would be
> annoyed with the fact
> that some (quite simple) changes were postponed
> because they are "not
> too trivial to be fixed just now" (because they will
> suspect they would
> be postponed forever). So the patch creator would
> (probably) receive
> some advices during the review like "please fix also
> fix this and that"
> (which is normal, of course).
>
> So my preference was to make the changes package by
> package (in some
> reasonable amount of files) not postponing part of the
> changes for the
> future (sorry for these boring repeating review
> requests). Please note
> that all the above mentioned is *rather explanation of
> my motivation
> than objection* :) (and of course I used some text
> editor replace
> automation which is surely not so advanced as Perl).
>
> > It's probably correct, but I would have left it
> out of this change
> Yes, I see. Reverted (please update the web page):
> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/index.html
> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/index.html>
>
> Thanks,
> Alexander
>
> P.S. The <tt> replacement job is mostly (I guess,
> ~80%) complete. But
> probably this approach should be used if some similar
> replacement task
> for, e.g., <code></code> tags would be planned in
> future (there are
> thousands of them).
>
>
> On 9/24/2015 6:10 AM, Martin Buchholz wrote:
>
>
> On Sat, Sep 19, 2015 at 6:58 AM, Alexander Stepanov
> <alexander.v.stepanov at oracle.com
> <mailto:alexander.v.stepanov at oracle.com>
> <mailto:alexander.v.stepanov at oracle.com
> <mailto:alexander.v.stepanov at oracle.com>>> wrote:
>
> Hello,
>
> Could you please review the following fix
> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/
> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>
> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
> http://cr.openjdk.java.net/~avstepan/8133651/jaxws.00/index.html
> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html>
>
> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html
>
>
> for
> https://bugs.openjdk.java.net/browse/JDK-8133651
>
> Just another portion of deprecated <tt> (and
> <xmp>) tags replaced
> with {@code }. Some misprints were also fixed.
>
>
> I'm biased of course, but I like the approach I
> took with
> blessed-modifier-order:
> - make the change completely automated
> - leave "human editing" for a separate change
> - publish the code used to make the automated
> change (in my case,
> typically a perl one-liner)
>
>
> The following (expected) changes were
> detected by specdiff:
> - removed needless dashes in java.util.Locale,
> - removed needless curly brace in
> xml.bind.annotation.XmlElementRef
>
>
> I would do a separate automated "removed needless
> dashes" changeset.
>
>
> Please let me know if the following changes
> are desirable or not:
>
> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html>
> <
> http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>
>
>
>
> This is an actual change to the behavior of this
> code - the
> maintainers of jconsole need to approve it. It's
> probably correct,
> but I would have left it out of this change. If
> you remove it, then I
> approve this change.
>
>
>
>
More information about the core-libs-dev
mailing list