Spec and API review for {Int,Long,Double}SummaryStatistics

Jim Mayer jim at pentastich.org
Sun Mar 31 20:28:39 PDT 2013


Comments below on the following message to lambda-lib-spec-experts.

On Fri, Mar 29, 2013 at 2:01 PM, Kevin Bourrillion <kevinb at google.com>wrote:

> On Thu, Mar 28, 2013 at 6:37 PM, Mike Duigou <mike.duigou at oracle.com>
> wrote:
>
> ...
>
> One comment which was not addressed was whether getAverage() should throw
> > a zero division ArithmeticException if no values had been recorded. I
> > believe the current default of returning 0.0 is reasonable and it is
> > convenient to not have to check the catch the exception. It's also in
> line
> > with the defaults we provide for sum, sumOfSquares, min, and max.
>
>
> I think I've said this before, but I believe this is extremely wrong.  sum
> and sumOfSquares have a well-defined and obvious identity.  min, max and
> average are entirely meaningless when applied to zero values.  What would
> you think of a language where 1 / 0 returned 0?  How can we claim this is
> any different?  I believe no one will ever curse your name for throwing the
> exception.
>
> I strongly agree that zero is not a good default for getAverage when no
values have been recorded.  My personal experience has been that hiding
errors because handling exceptions is painful just leads to subtle, hard to
find, bugs.  If the experts committee wants to avoid throwing exceptions
then I'd suggest returning a NaN.

If that seems wrong, then just leave getAverage out... it's easy enough to
compute.

Also, while I'm here...
>
> Exposing sumOfSquares() does not permit users to safely calculate variance,
> which I believe makes it fairly useless and even dangerous:
>
> "The failure of Cauchy's fundamental inequality is another important
> example of the breakdown of traditional algebra in the presence of floating
> point arithmetic...Novice programmers who calculate the standard deviation
> of some observations by using the textbook formula [formula for the
> standard deviation in terms of the sum of squares] often find themselves
> taking the square root of a negative number!"  (Knuth AoCP vol 2, section
> 4.2.2)
>

I agree with this as well.  I would prefer to leave the method out
altogether and wait for some  numeric types to implement a robust
statistics library.  Standard deviations are horribly overused anyway
because they are so familiar and so keep being used to "describe"
non-normal distributions.

Jim Mayer


More information about the lambda-dev mailing list