P.S.: RFR [9] 8133651: automated replacing of old-style tags in docs

Stuart Marks stuart.marks at oracle.com
Thu Oct 1 01:16:32 UTC 2015


Hi Alexander, Martin,

The challenge of Perl file slurping and Emacs one-liners was too much to bear.

This is Java, so one-liners are hardly possible. Still, there are a bunch of 
improvements that can be made to the Java version. (OK, and I'm showing off a bit.)

Take a look at this:

http://cr.openjdk.java.net/~smarks/misc/SimpleTagEditorSmarks1.java

I haven't studied the output exhaustively, but it seems to do a reasonably good 
job for the common cases. I ran it over java.lang and I noticed a few cases 
where there is markup embedded within <code></code> text, which should be looked 
at more closely.

I don't particularly care if you use my version, but there are some techniques 
that I'd strongly recommend that you consider using in any such tool. In particular:

  - Pattern.DOTALL to do multi-line matches
  - Pattern.CASE_INSENSITIVE
  - try-with-resources to ensure that files are closed properly
  - NIO instead of old java.io APIs, particularly Files.walk() and streams
  - use Scanner to deal with input file buffering
  - Scanner's stream support (I recently added this to JDK 9)

Enjoy,

s'marks


On 9/29/15 2:23 PM, Martin Buchholz wrote:
> Hi Alexander,
>
> your change looks good.  It's OK to have manual corrections for automated
> mega-changes like this, as long as they all revert changes.
>
> Random comments:
>
> Should you publish your specdiff?  I guess not - it would be empty!
>
>              while((s = br.readLine()) != null) {
>
> by matching only one line at a time, you lose the ability to make
> replacements that span lines.  Perlers like to "slurp" in the entire file
> as a single string.
>
>          s = s.replace( "<CODE>", tag1);
>          s = s.replace( "<Code>", tag1);
>          s = s.replace("</CODE>", tag2);
>          s = s.replace("</Code>", tag2);
>
> Why not use case-insensitive regex?
>
> Here's an emacs-lisp one-liner I've been known to use:
>
> (defun tt-code ()
>    (interactive)
>    (query-replace-regexp "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)</\\1>" "{@code
> \\2}"))
>
> With more work, one can automate transformation of embedded things like <
>
> But of course, it's not even possible to transform ALL uses of <code> to
> {@code, if there was imaginative use of nested html tags.
>
>
> On Tue, Sep 29, 2015 at 3:21 AM, Alexander Stepanov <
> alexander.v.stepanov at oracle.com> wrote:
>
>> Updated: a few manual corrections were made (as @linkplain tags displays
>> nested {@code } literally):
>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
>> -checked with specdiff (which of course does not cover documentation for
>> internal packages), no unexpected diffs detected.
>>
>> Regards,
>> Alexander
>>
>>
>> On 9/27/2015 4:52 PM, Alexander Stepanov wrote:
>>
>>> Hello Martin,
>>>
>>> Here is some simple app. to replace <code></code> tags with a new-style
>>> {@code } one (which is definitely not so elegant as the Perl one-liners):
>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/SimpleTagEditor.java
>>>
>>> Corresponding patch for jdk and replacement log (~62k of the tag changes):
>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/replace.log
>>> (sorry, I have to check the correctness of the patch with specdiff yet,
>>> so this is rather demo at the moment).
>>>
>>> Don't know if these changes (cosmetic by nature) are desired for now or
>>> not. Moreover, probably some part of them should go to another repos (e.g.,
>>> awt, swing -> "client" instead of "dev").
>>>
>>> Regards,
>>> Alexander
>>>
>>>
>>>
>>> ----- Исходное сообщение -----
>>> От: alexander.v.stepanov at oracle.com
>>> Кому: martinrb at google.com
>>> Копия: core-libs-dev at openjdk.java.net
>>> Отправленные: Четверг, 24 Сентябрь 2015 г 16:06:56 GMT +03:00 Москва,
>>> Санкт-Петербург, Волгоград
>>> Тема: Re: RFR [9] 8133651: replace some <tt> tags (obsolete in html5) in
>>> core-libs docs
>>>
>>> Hello Martin,
>>>
>>> Thank you for review and for the notes!
>>>
>>>    > I'm biased of course, but I like the approach I took with
>>> blessed-modifier-order:
>>>    > - make the change completely automated
>>>    > - leave "human editing" for a separate change
>>>    > - publish the code used to make the automated change (in my case,
>>> typically a perl one-liner)
>>>
>>> Automated replacement has an obvious advantage: it is fast and massive.
>>> But there are some disadvantages at the same time (just IMHO).
>>>
>>> Using script it is quite easy to miss some not very trivial cases, e.g.:
>>> - remove unnecessary linebreaks, like
>>>     * <tt>someCode
>>>     * </tt>
>>> (which would be better to replace with single-line {@code someCode};
>>> - joining of successive terms, like "<tt>ONE</tt>, <tt>TWO</tt>,
>>> <tt>THREE</tt>" -> "{@code ONE, TWO, THREE}";
>>> - errors like extra or missing "<" or ">": * <tt>Collection
>>> <T></tt>", - there were a lot of them;
>>> - some cases when <tt></tt> should be replaced with <code></code>, not
>>> {@code } (e.g. because of unicode characters inside of code etc.);
>>> - extra tags inside of <tt> or <code> which should be moved outside of
>>> {@code }, like <tt><i>someCode</i></tt> or <tt><b>someCode</b></tt>;
>>> - simple removing of needless tags, like "<tt>{@link ...}</tt>" ->
>>> "{@link ...}";
>>> - replace HTML codes with symbols ('<', '>', '@', ...)
>>> - etc.
>>> - plus some other formatting changes and fixes for misprints which would
>>> be omitted during the automated replacement (and wouldn't be done in
>>> future manually because there is no motivation for repeated processing).
>>>
>>> So sometimes it may be difficult to say where is the border between
>>> "trivial" and "human-editing" cases (and the portion of "non-trivial
>>> cases" is definitely not minor); moreover, even the automated
>>> replacement requires the subsequent careful review before publishing of
>>> webrev (as well as by reviewers who probably wouldn't be happy to review
>>> hundreds of files at the same time) and iterative checks/corrections.
>>> specdiff is very useful for this task but also cannot fully cover the
>>> diffs (as some changes are situated in the internal com/... sun/...
>>> packages).
>>>
>>> Moreover, I'm sure that some reviewers would be annoyed with the fact
>>> that some (quite simple) changes were postponed because they are "not
>>> too trivial to be fixed just now" (because they will suspect they would
>>> be postponed forever). So the patch creator would (probably) receive
>>> some advices during the review like "please fix also fix this and that"
>>> (which is normal, of course).
>>>
>>> So my preference was to make the changes package by package (in some
>>> reasonable amount of files) not postponing part of the changes for the
>>> future (sorry for these boring repeating review requests). Please note
>>> that all the above mentioned is *rather explanation of my motivation
>>> than objection* :) (and of course I used some text editor replace
>>> automation which is surely not so advanced as Perl).
>>>
>>>    > It's probably correct, but I would have left it out of this change
>>> Yes, I see. Reverted (please update the web page):
>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/index.html
>>>
>>> Thanks,
>>> Alexander
>>>
>>> P.S. The <tt> replacement job is mostly (I guess, ~80%) complete. But
>>> probably this approach should be used if some similar replacement task
>>> for, e.g., <code></code> tags would be planned in future (there are
>>> thousands of them).
>>>
>>>
>>> On 9/24/2015 6:10 AM, Martin Buchholz wrote:
>>>
>>>>
>>>> On Sat, Sep 19, 2015 at 6:58 AM, Alexander Stepanov
>>>> <alexander.v.stepanov at oracle.com
>>>> <mailto:alexander.v.stepanov at oracle.com>> wrote:
>>>>
>>>>       Hello,
>>>>
>>>>       Could you please review the following fix
>>>>       http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/
>>>>       <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
>>>>       http://cr.openjdk.java.net/~avstepan/8133651/jaxws.00/index.html
>>>>       <http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html
>>>>>
>>>>       for
>>>>       https://bugs.openjdk.java.net/browse/JDK-8133651
>>>>
>>>>       Just another portion of deprecated <tt> (and <xmp>) tags replaced
>>>>       with {@code }. Some misprints were also fixed.
>>>>
>>>>
>>>> I'm biased of course, but I like the approach I took with
>>>> blessed-modifier-order:
>>>> - make the change completely automated
>>>> - leave "human editing" for a separate change
>>>> - publish the code used to make the automated change (in my case,
>>>> typically a perl one-liner)
>>>>
>>>>
>>>>       The following (expected) changes were detected by specdiff:
>>>>       - removed needless dashes in java.util.Locale,
>>>>       - removed needless curly brace in xml.bind.annotation.XmlElementRef
>>>>
>>>>
>>>> I would do a separate automated "removed needless dashes" changeset.
>>>>
>>>>
>>>>       Please let me know if the following changes are desirable or not:
>>>>
>>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>>>>       <
>>>> http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
>>>>>
>>>>
>>>>
>>>> This is an actual change to the behavior of this code - the
>>>> maintainers of jconsole need to approve it.  It's probably correct,
>>>> but I would have left it out of this change. If you remove it, then I
>>>> approve this change.
>>>>
>>>
>>



More information about the core-libs-dev mailing list