[threeten-dev] Review for parsing 2 to 4 years #218

Mon Apr 29 23:24:02 PDT 2013

I think the core logic here is now right. In particular its really
good that there is no difference internally in appendValue() with
fixed/non-fixed width.

DTFB line 559 needs tweaking to make sense:
"If there is a value PrinterParser is not active then the
PrinterParser becomes the active PP;"

DTFB lines 1370 (Javadoc) and line 1658 (code) differ.

The Javadoc for the two appendValueReduced methods doesn't make it
clear enough to me as to their difference. It seems to me that the
simpler method is useful for all normal cases. In other places I've
got a paragraph for formatting, then a paragraph for parsing - that
pattern would be beneficial here. Line 468-470 seems wrong now. Para
line 476 also looks wrong.

DTFB line 2685, 2715, 496, 543 Javadoc differ from what the code accepts.

In TestReducedParser, the parseStrict tests under "Negative baseValue"
do not all have negative bases, and perhaps they should. Same with
parseLenient

The TestReducedParser lenient tests has this
{YEAR, 2, 4, 2000, "-10", 0, 3, 2090},
which I think should be
{YEAR, 2, 4, 2000, "-10", 0, 3, -10},
as we should only map 01 to 99 to the base value (unless
SimpleDateFormat does what your patch proposes)

While this
{"ddMMyy", "25062001",  LENIENT, 0, 8, 2001, 20, 2506},
is clearly "wrong" as far as a human might read it, it is right as far
as the rules defined are concerned. We could only get better if we
used the knowledge of month range and/or have separate append methods
for fixed width. That might be desirable, but is a lot more
complicated.

So, with all the great logic now in the appendValue and
appendValueReduced, the only things that the new reduced method
provide AFAICT are:
- allow strict parsing to be more lenient
- allow lenient parsing to control the maximum width

I guess that is sufficient behaviour to justify the new method,
although I suspect it will be little used.
thanks
Stephen

On 29 April 2013 22:54, roger riggs <roger.riggs at oracle.com> wrote:
> Hi,
>
> Webrev:
>    http://cr.openjdk.java.net/~rriggs/webrev-lenient-parse-218/
>
> The logic for adjacent value mode has been consolidated a single method
> that uses the current state of active parser and the new PrinterParser
> to decide how to adjust the active and new PrinterParser.
>
> The first value printer parser becomes the active PrinterParser (and if
> parsing
> is set to lenient may consume a variable number of digits), subsequent
> fixed width fields are forced to remain fixed width and their width
> is contributed to the first value field to prevent unbounded digit
> consumption.
> If a variable width field is appended to the sequence, the initial active
> value
> field is modified to be fixed width so that the entire sequence is fixed and
> the new variable width field becomes the active printer parser.
> A ValueReducedPrinterParser is treated no differently.
>
> Please review the updated test cases for correctness in the lenient
> and strict modes and the other changes.
>
> Thanks, Roger
>
>
>
> On 4/28/2013 10:09 PM, Roger Riggs wrote:
>>
>> Hi,
>>
>> The remaining problem is that the pattern "MMdd" or the equivalent
>> builder appends of "Value(2), Value(2)", is not recognized as using
>> adjacent parsing.  When set to lenient the NumberPP parse of "MM"
>> gobbles up all the input leaving nothing for 'dd'.
>> (This is a new issue since NumberPP in lenient mode is no longer limited
>> to maxWidth).
>>
>> If the field is marked as withfixedWidth (subsequentWidth == -1) the field
>> is no longer lenient.
>>
>> The problem that remains is that a field identified by the builder
>> (minWidth == maxWidth)
>> as fixed width may not actually be fixed width at the time the PP is
>> invoked.
>>
>> I am inclined to think that adjacent value mode must always be engaged,
>> even for fixed fields.  The first fixed field must be informed that it is
>> followed by
>> fixed fields.
>>
>> Roger
>>
>>
>>
>> On 4/26/2013 4:00 PM, Stephen Colebourne wrote:
>>>
>>>
>>> On 26 Apr 2013 20:32, "roger riggs" <roger.riggs at oracle.com
>>> <mailto:roger.riggs at oracle.com>> wrote:
>>> > Start with a hard one.  "yyMMdd" in the case where
>>> > ReducedValue/NumberPP is set to lenient.
>>> >
>>> > "76321" could be parsed as (76, 3, 21) or (76, 32, 1)
>>>
>>> Should be 7, 63, 21
>>>
>>> > "765432" could be parsed several ways.  (76, 5, 432) (76, 54, 32), etc.
>>>
>>> 76 54 32
>>>
>>> The first is variable and the others fixed. Thus lenient could only
>>> impact the first if others are fixed width.
>>> Stephen
>>>
>>> > The subsequentWidth mechanism only works for fixed fields
>>> > following a variable width field.  If the parsing is set to lenient
>>> > then what was a fixed field is now variable but the static mechanism
>>> > for lookahead can't anticipate that.
>>> >
>>> > The current mechanism works fine as long as fixed width fields
>>> > have the same width whether lenient or strict.  I think it is
>>> > unnecessary
>>> > and too big a change to modify NumberPP to extend lenient parsing to
>>> > fixed width fields.
>>> >
>>> > Roger
>>> >
>>> >
>>> >
>>> >
>>> > On 4/26/2013 2:13 PM, Stephen Colebourne wrote:
>>> >>
>>> >> On 26 April 2013 16:14, roger riggs <roger.riggs at oracle.com
>>> >> <mailto:roger.riggs at oracle.com>> wrote:
>>> >>>
>>> >>> Suggestions?
>>> >>
>>> >> I think we should start with a set of test cases we agree on. Then
>>> >> whatever code needs to be changed will have to be changed. It sounds
>>> >> like the adjacent value parsing mechanism will need some tweaking.
>>> >>
>>> >> Stephen
>>> >
>>> >
>>>
>>
>>
>