[threeten-dev] Review for parsing 2 to 4 years #218

Fri Apr 26 08:14:50 PDT 2013

Hi,

On 4/26/2013 10:21 AM, Stephen Colebourne wrote:
> Your parseAdjacent test in TestReducedPrinter should be in TestReducedParser
It is misnamed, it is a printing test to confirm that the output is 2 digits
when the value is in range of the baseValue and the truncates to 
maxWidth otherwise.
>
> Looking at the TestReducedParser.parseAll test cases, I'd expect this
> {YEAR, 2, 2010, "123", 0, 2, 2012},
> to parse year 123 in lenient mode
Except that with fixed width of 2, lenient is disabled, see below
>
> {YEAR, 1, 2010, "10", 0, 1, 1},
> {YEAR, 1, 2005, "21", 0, 1, 2},
> Similarly, these should parse 10 and 21 in lenient mode.
>
> {YEAR, 2, 2010, "321", 0, 2, 2032},
> {YEAR, 2, 2010, "54321", 0, 2, 2054},
> Simlarly, parsing 321 and 54321 in lenient mode.
>
> The parseStrict cases result in weird things
> {YEAR, 1, 2, 2010, "12", 0, 1, 1},
> {YEAR, 1, 4, 2010, "21", 0, 1, 2},
> Thats because the second method with two widths doesn't make sense here.
>
> I still think we should delete the method with the extra argument. The
> method with a single width should parse up to a sensible maximum width
> in lenient mode (where permitted by adjacent value parsing).
The problem arises that with a single width min=max, the logic for min 
== max
never does the setup of valueParserIndex that accumulates subsequentWidths.
So the number parser in lenient mode consumes all the input and leaves
nothing for the rest of the fields.

Only a variable width field triggers the logic for adjacent value mode.
See appendValue(field, min, max, signstyle)

I coded that up and it didn't work.  I opted not to try to redesign
that mechanism but make the reduceValue not a fixed width field.

>
> I don't think the Javadoc of the original method now describes what
> happens when leniently parsing less than the specified width. It
> should clearly desribe strict vs lenient, and lenient matching width
> vs lenient absorb anything available.
>
>
> The maximum field width is generally 19, to allow for long values. But
> 10 (for int values) is also avalid option if documented.
>
> NumberPrinterParser should also parse up to 19 digits in lenient mode
> (where permitted by adjacent value parsing).
As above, a fixed width field can't be made lenient because the 
subsequentWidth
is never accumulated after a fixedWidth field.

Suggestions?

Thanks, Roger

>
> Stephen
>
>
>
> On 25 April 2013 23:06, roger riggs <roger.riggs at oracle.com> wrote:
>> Hi,
>>
>> Updated as recommended below:
>>
>> Webrev:
>>      http://cr.openjdk.java.net/~rriggs/webrev-lenient-parse-218/
>>
>> One open question is whether lenient parsing of numbers (including
>> ValueReduced) should respect the maxWidth.
>> Currently, NumberPrinterParser uses maxWidth to limit the width of the
>> parse.
>> The ReducedPrinterParser needs an upper bound on the width to parse
>> leniently.
>>
>> Another minor issue is the range of text width and range of BaseValue.
>> The original code allowed field widths of up to 19 but the baseValue
>> was limited to an int and the table of powers of ten was limited to 10
>> digits.
>> Is there a recommended range for fields that might use ReducedValue parsing?
>>
>> Thanks, Roger
>>
>>
>>
>> On 4/25/2013 6:07 AM, Stephen Colebourne wrote:
>>> My concern is that this change caters too much for the edge case and
>>> not enough for the typical use case. In the typical use case, the user
>>> just wants to format/parse a fixed 2 digit year. The change to the
>>> builder now means they have to specify two widths, not one. We should
>>> retain the method with one width argument at the very least.
>>>
>>> Next, it seems that the behaviour is different to SimpleDateFormat,
>>> where parsing anything other than the specified width results in a
>>> pure year, not a year merged with the base year. At the very least, we
>>> need a way to replicate that, so our patterns match SDF.
>>>
>>> This would seem sensible behaviour:
>>> * appendValueReduced(YEAR, 2, 2000)
>>> * format = lowest 2 digits of year, zero padded if necessary
>>> * parse-strict = only accept two digits and apply base year
>>> * parse-lenient alone (yy/MM) = parse as though it were
>>> appendValue(YEAR). If 2 digits parsed then apply base year.
>>> * parse-lenient not adjacent (yyMMdd) = parse as though it were
>>> appendValue(YEAR) with 4 reserved digits for MMdd. If 2 digits parsed
>>> then apply base year.
>>> * parse-lenient adjacent (MMddyy) = same as parse-strict
>>>
>>> Beyond that, the new method you're proposing seems to be controlling
>>> the maximum width of parsing allowed in lenient mode. I guess that is
>>> fine, but I'm not sure it would see wide applicability. We generally
>>> just say lenient is lenient, and I don't think a maxWidth concept
>>> should apply at all in strict mode in the context of "reduced".
>>>
>>> Sorry this change seems more complex than I expected.
>>> Stephen
>>>
>>>
>>>
>>> On 24 April 2013 23:00, roger riggs <roger.riggs at oracle.com> wrote:
>>>> Hi,
>>>>
>>>> Cleanup of the changes for addValueReduced and the ReducedPrinterParser.
>>>> The parseStrict and parseLenient logic re-introduced so that by default
>>>> the behavior of appendPattern("yyMMdd") will result in a fixed width
>>>> parse
>>>> unless setLenient is called in the DateTimeFormatterBuilder.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~rriggs/webrev-lenient-parse-218/
>>>>
>>>> Please review.
>>>>
>>>> Thanks, Roger
>>>>
>>>> [1] https://github.com/ThreeTen/threeten/issues/218
>>