-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic sum #1
Comments
If the partial values «p0, p1, …, pk-1» can differ in sign, then it's possible to get obscure cases where the mathematical value about to be stored into p0 is on a knife edge that rounds to ±∞ while p1 would have the opposite sign that would cause the rounding to prefer the largest-magnitude finite value. This would cause a problem for the pick-next-addend-with-opposite-sign approach, but would be handled by the single-pass scaling-down-p0 approach. |
Here is an implementation using the single-pass scaling approach. This was indeed a bit tedious, but I think it's correct - I also wrote code to convert doubles to/from BigInts (scaled up by 2**1023), in which domain we can do addition/subtraction precisely, and used this to fuzz my implementation. This caught a few nasty edge cases. It's possible I might still have missed a case somewhere. Note that, following Python, the partials are in ascending order of magnitude, so the "overflow" partial is conceptually last in this implementation, not first. |
In the original issue for adding fsum to python, there is attached a version based on a different approach:
Because this approach doesn't use 2Sum, it doesn't have issues with overflow. I've extracted it from the diff in the thread. I'm probably not going to bother implementing this in JS as well, but it does prove the possibility. lsum.c/* Full precision summation of a sequence of floats. Based on the
function lsum by Raymond Hettinger at
<http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/393090>,
and following implementation suggestions from Tim Peters.
Method: maintain the sum so far in the form
huge_integer * 2**FSUM_MIN_EXP
where FSUM_MIN_EXP is such that every finite double can be
expressed as an integral multiple of 2**FSUM_MIN_EXP. huge_integer
is stored in base FSUM_BASE. Following a suggestion from Tim
Peters, we use a signed-digit representation: the digits for the
base FSUM_BASE representation all lie in the range [-FSUM_BASE/2,
FSUM_BASE/2). It's not hard to show that every integer (positive
or negative), can be expressed uniquely in the form
a0 + a1*FSUM_BASE + a2*FSUM_BASE**2 + ...
This choice of representation makes it easy to deal with both
positive and negative summands.
Note: The signature of math.fsum() differs from __builtin__.sum()
because the start argument doesn't make sense in the context of
accurate summation. sum(seq2, start=sum(seq1)) may not equal the
accurate result returned by sum(itertools.chain(seq1, seq2)).
*/
#define FSUM_MIN_EXP (DBL_MIN_EXP - DBL_MANT_DIG)
#define FSUM_SIZE 30
#define FSUM_BASE ((long) 1 << FSUM_SIZE)
/* allow at least 60 extra bits at the top end, to cope with intermediate
overflow. On an IEEE 754 machine, FSUM_ACC_SIZE is 72. */
#define FSUM_ACC_SIZE ((DBL_MAX_EXP - FSUM_MIN_EXP + 60) / FSUM_SIZE + 1)
/* _fsum_twopower[i] is 2.0**(i+1) */
static
const double _fsum_twopower[FSUM_SIZE] = {
2.0,
4.0,
8.0,
16.0,
32.0,
64.0,
128.0,
256.0,
512.0,
1024.0,
2048.0,
4096.0,
8192.0,
16384.0,
32768.0,
65536.0,
131072.0,
262144.0,
524288.0,
1048576.0,
2097152.0,
4194304.0,
8388608.0,
16777216.0,
33554432.0,
67108864.0,
134217728.0,
268435456.0,
536870912.0,
1073741824.0,
};
static PyObject * math_fsum(PyObject * self, PyObject * seq) {
PyObject * item, * iter;
double x, m, inf_sum = 0.0, special_sum = 0.0;
int e, q, q_top, i, round_up, ndigits, top_exp;
long digit, half_eps, acc[FSUM_ACC_SIZE], carry, sign;
iter = PyObject_GetIter(seq);
if (iter == NULL)
return NULL;
/* initialize accumulator */
for (i = 0; i < FSUM_ACC_SIZE; i++)
acc[i] = 0;
/********************************************
* Stage 1: accumulate values from iterable *
********************************************/
for (;;) {
item = PyIter_Next(iter);
if (item == NULL) {
Py_DECREF(iter);
if (PyErr_Occurred())
return NULL;
break;
}
x = PyFloat_AsDouble(item);
Py_DECREF(item);
if (PyErr_Occurred()) {
Py_DECREF(iter);
return NULL;
}
/* accumulate specials in special_sum, infs in inf_sum */
if (!Py_IS_FINITE(x)) {
if (Py_IS_INFINITY(x))
inf_sum += x;
special_sum += x;
continue;
}
m = frexp(x, & e);
e -= (FSUM_MIN_EXP + 1);
/* add digits of m into accumulator */
m *= _fsum_twopower[e % FSUM_SIZE];
q_top = q = e / FSUM_SIZE;
for (;;) {
digit = (long) m;
m -= digit;
acc[q] += digit;
if (m == 0.0)
break;
m *= (double) FSUM_BASE;
q--;
}
/* normalize accumulator */
for (;;) {
if (acc[q] < -FSUM_BASE / 2) {
acc[q] += FSUM_BASE;
acc[++q]--;
} else if (acc[q] >= FSUM_BASE / 2) {
acc[q] -= FSUM_BASE;
acc[++q]++;
} else if (++q > q_top)
break;
}
}
/***************************************************
* Stage 2: compute correctly rounded return value *
***************************************************/
/* deal with any special values that occurred. See note above. */
if (special_sum != 0.0) {
if (Py_IS_NAN(inf_sum)) {
PyErr_SetString(PyExc_ValueError,
"-inf + inf in fsum");
return NULL;
}
return PyFloat_FromDouble(special_sum);
}
/* strip zero limbs from top */
ndigits = FSUM_ACC_SIZE;
while (ndigits > 0 && acc[ndigits - 1] == 0)
ndigits--;
if (ndigits == 0)
return PyFloat_FromDouble(0.0);
/* sign of result == sign of topmost nonzero limb */
sign = acc[ndigits - 1] > 0 ? 1 : -1;
/* take absolute value of accumulator, renormalizing digits
to lie in the range [0, FSUM_BASE) */
carry = 0;
for (i = 0; i < ndigits; i++) {
/* FSUM_BASE/2-1 <= acc[i]*sign + carry <= FSUM_BASE/2 */
digit = acc[i] * sign + carry;
if (digit < 0) {
acc[i] = digit + FSUM_BASE;
carry = -1;
} else {
acc[i] = digit;
carry = 0;
}
}
assert(carry == 0);
/* it's possible that the normalization leads to a zero top digit */
if (acc[ndigits - 1] == 0)
ndigits--;
assert(ndigits > 0 && acc[ndigits - 1] != 0);
/* Round acc * 2**FSUM_MIN_EXP to the nearest double. */
/* 2**(top_exp-1) <= |sum| < 2**top_exp */
top_exp = FSUM_MIN_EXP + FSUM_SIZE * (ndigits - 1);
for (digit = acc[ndigits - 1]; digit != 0; digit /= 2)
top_exp++;
/* catch almost all overflows here */
if (top_exp > DBL_MAX_EXP)
goto overflow_error;
m = 0.0;
if (top_exp <= DBL_MIN_EXP) {
/* no need for rounding */
for (i = ndigits - 1; i >= 0; i--)
m = FSUM_BASE * m + acc[i];
return PyFloat_FromDouble((double) sign *
ldexp(m, FSUM_MIN_EXP));
}
/* round: e is the position of the first bit to be rounded away. */
e = top_exp - DBL_MIN_EXP - 1;
half_eps = (long) 1 << (e % FSUM_SIZE);
q = e / FSUM_SIZE;
for (i = ndigits - 1; i > q; i--)
m = FSUM_BASE * m + acc[i];
digit = acc[q];
m = FSUM_BASE * m + (digit & (FSUM_BASE - 2 * half_eps));
if ((digit & half_eps) != 0) {
round_up = 0;
if ((digit & (3 * half_eps - 1)) != 0 ||
(half_eps == FSUM_BASE / 2 && (acc[q + 1] & 1) != 0))
round_up = 1;
else
for (i = q - 1; i >= 0; i--)
if (acc[i] != 0) {
round_up = 1;
break;
}
if (round_up == 1) {
m += 2 * half_eps;
if (top_exp == DBL_MAX_EXP &&
m == ldexp((double)(2 * half_eps), DBL_MANT_DIG))
/* overflow corner case: result before
rounding is within range, but rounded
result overflows. */
goto overflow_error;
}
}
return PyFloat_FromDouble((double) sign *
ldexp(m, FSUM_MIN_EXP + FSUM_SIZE * q));
overflow_error:
PyErr_SetString(PyExc_OverflowError,
"fsum result too large to represent as a float");
return NULL;
} |
This actually scales up the double values by 21075, which is one factor of two too many — every resulting BigInt is even. I also don't understand this snippet:
The if condition compares a string to a number using ===, which can never be true. |
Sorry, yes, 21075. I'd previously left the last bit as 0 because it let me write
Whoops, that's a leftover from an earlier version. You're right that it doesn't do anything now (and would be redundant with the |
I've simplified the main implementation following this code from the original Python issue. The implementation I had earlier was conceptually quite close to Shewchuk's algorithm: it added But it doesn't have to work like that. Instead, when overflow occurs during one of the " |
I'm trying to put together disparate sources to convince myself of the correctness of the theorems, specifically to make sure that the invariants assumed by a subsequent step actually match what the previous step claims to produce. Can you document the precise invariant that the partials are required to satisfy in the implementation? The details matter here. The Python issue assumes that the invariant is that they're nonoverlapping and in increasing magnitude. In addition to edge cases around overflows, some of what I'm looking at: "Nonoverlapping" does not mean that they're necessarily 53 bits apart; the binary values 11010000, 1000, 100, and 1.101 are nonoverlapping doubles, but some of the theorems in the papers require stronger input criteria which a sequence like that would violate. It's probably ok in this usage, but I need to check. Quick links: |
I've been relying on Theorem 10 of the Robust Arithmetic paper, which is fairly straightforward and which only needs only the assumption that the partials are finite, nonoverlapping, and increasing in magnitude (and implicitly that The later theorems are much more complicated and require other properties, but aren't actually necessary for this implementation. From what I can tell the paper doesn't cover producing a single sum from the expansion once you've added all the values in, but I convinced myself of the correctness of the procedure used in |
Just to double-check: the invariant is that the partials are nonzero, finite, nonoverlapping, and increasing in magnitude? (Theorem 10 allows zero-valued partials in input and output, while crsum assumes that the partials are nonzero and can produce incorrect results if zero-valued partials exist). |
Yes, though strictly speaking that's covered by "increasing in magnitude" (except for possibly the first partial, for which it doesn't matter if it's zero). Theorem 10 does not guarantee that you end up with a list of partials with the "nonzero" property, but that's easily achieved by discarding partials which are 0. To be precise: I've been relying on a slightly modified version of Theorem 10, which looks like: If you have an expansion e where all the partials are finite, nonoverlapping and increasing in magnitude, and the Two-Sum algorithm doesn't overflow, then applying the Grow-Expansion algorithm given e and b followed by a step which discards any partials which are 0 will produce a new expansion e' where all the partials are finite, nonoverlapping, and increasing in magnitude, and such that the mathematic sum of e' is equal to the mathematical sum of e and b. |
Incidentally there's been some research into faster approaches since Shewchuk '96; the fastest I've found is described in this post and given in code here. Per its claims there, on medium-size and larger lists of doubles (~1k+ elements), you can get exact summation with a 4x slowdown over naive summation by using 67 64-bit values (536 bytes). If you have a larger list (10k+ elements) and are willing to spend more memory, you can get exact summation with less than 2x slowdown over naive summation at the cost of using 33 Kb of memory. |
Hi. I notice that my xsum algorithm and software is mentioned here. You may be interested that it's used by the xsum package for Julia (https://juliapackages.com/p/xsum). You may also be interested in the following recent paper (which I haven't read in full detail, but which looks interesting): Lange, M., Towards accurate and fast summation, https://web.archive.org/web/20220616031105id_/https://dl.acm.org/doi/pdf/10.1145/3544488 One advantage of superaccumulators, as used in my method, is that one can easily give the correct result when it's a finite double even if naive summation would produce overflow for an intermediate result. One just needs enough bits in the superaccumulator to handle summation of the largest possible double the maximum concievable number of times. In my code, I think I assume a maximum of 2^32 operands, but one can allow for more by just adding a few more bits to the superaccumulator. |
Hm, seems like the xsum library has a bug where it incorrectly handles rounding of sums whose value is below I suspect it's some trivial error but I'm not likely going to be able to track it down myself in the near future. It passes my other tests, though I haven't yet set up a fuzzer for it. But engines which want to use it will need to either fix or work around this bug first. |
I'll be looking into the issue with xsum, and hope to have a fix within a week or so. |
I propose making the sum be the deterministic mathematical sum of the inputs, rounded to the nearest double, breaking ties to even in the typical IEEE double fashion. The special cases are as for regular addition:
This can be implemented as follows, using the exact intermediate sum technique which takes an arbitrary number of finite IEEE double addends a0, a1, …, an-1 and, barring intermediate or final overflows, represents their exact mathematical sum a0 + a1 + … + an-1 as s = p0 + p1 + …+ pk-1, where the p's are IEEE doubles, all have the same sign, are in descending magnitude order, are non-overlapping, meaning that each successive p's exponent is at least 53 less than the previous one, and are nonzero (except possibly p0).
This is the simplest approach. The algorithm is asymptotically linear in the number of addends n and efficient, typically having k no more than 2.
The steps can be folded into a mostly or fully single-pass algorithm. For example:
If one wishes to do a fully single-pass algorithm, one can avoid the scan for addends of the opposite sign from the running total by scaling down the most significant partial p0 and values added into it by 52 powers of 2 if it would overflow, while leaving the other partials p1, …, pk-1 unscaled. This is a bit tedious requiring scaling when working across the first and second partial but can be done efficiently.
The text was updated successfully, but these errors were encountered: