2Sum

2Sum[1] is a floating-point algorithm for computing the exact round-off error in a floating-point addition operation.

2Sum and its variant Fast2Sum were first published by Møller in 1965.[2] Fast2Sum is often used implicitly in other algorithms such as compensated summation algorithms;[1] Kahan's summation algorithm was published first in 1965,[3] and Fast2Sum was later factored out of it by Dekker in 1971 for double-double arithmetic algorithms.[4] The names 2Sum and Fast2Sum appear to have been applied retroactively by Shewchuck in 1997.[5]

Algorithm

Given two floating-point numbers $a$ and $b$ , 2Sum computes the floating-point sum $s:=\operatorname {fl} (a+b)$ and the floating-point error $t:=a+b-\operatorname {fl} (a+b)$ so that $s+t=a+b$ . The error $t$ is itself a floating-point number.

Inputs floating-point numbers

a,b

Outputs sum

s=\operatorname {fl} (a+b)

and error

t=a+b-\operatorname {fl} (a+b)

$s:=a\oplus b$
$a':=s\ominus b$
$b':=s\ominus a'$
$\delta _{a}:=a\ominus a'$
$\delta _{b}:=b\ominus b'$
$t:=\delta _{a}\oplus \delta _{b}$
return $(s,t)$

Provided the floating-point arithmetic is correctly rounded to nearest (with ties resolved any way), as is the default in IEEE 754, and provided the sum does not overflow and, if it underflows, underflows gradually, it can be proven that $s+t=a+b$ .[1][6][2]

A variant of 2Sum called Fast2Sum uses only three floating-point operations, for floating-point arithmetic in radix 2 or radix 3, under the assumption that the exponent of $a$ is at least as large as the exponent of $b$ , such as when $\left|a\right|\geq \left|b\right|$ :[1][6][7][4]

Inputs radix-2 or radix-3 floating-point numbers

a

and

b

, of which at least one is zero, or which respectively have normalized exponents

e_{a}\geq e_{b}

Outputs sum

s=\operatorname {fl} (a+b)

and error

t=a+b-\operatorname {fl} (a+b)

$s:=a\oplus b$
$z=s\ominus a$
$t=b\ominus z$
return $(s,t)$

Even if the conditions are not satisfied, 2Sum and Fast2Sum often provide reasonable approximations to the error so that $s+t\approx a+b$ , which enables algorithms for compensated summation, dot-product, etc., to have low error even if the inputs are not sorted or the rounding mode is unusual.[1][2] More complicated variants of 2Sum and Fast2Sum also exist for rounding modes other than round-to-nearest.[1]

References

Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Cham, Switzerland: Birkhäuser. pp. 104–111. doi:10.1007/978-3-319-76526-6. ISBN 978-3-319-76525-9.
Møller, Ole (March 1965). "Quasi double-precision in floating point addition". BIT Numerical Mathematics. 5: 37–50. doi:10.1007/BF01975722. S2CID 119991676.
Kahan, W. (January 1965). "Further remarks on reducing truncation errors". Communications of the ACM. Association for Computing Machinery. 8 (1): 40. doi:10.1145/363707.363723. ISSN 0001-0782. S2CID 22584810.
Dekker, T.J. (June 1971). "A floating-point technique for extending the available precision". Numerische Mathematik. 18 (3): 224–242. doi:10.1007/BF01397083. S2CID 63218464.
Shewchuck, Jonathan Richard (October 1997). "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". Discrete & Computational Geometry. 18 (3): 305–363. doi:10.1007/PL00009321.
Knuth, Donald E. (1998). The Art of Computer Programming, Volume II: Seminumerical Algorithms (3rd ed.). Addison–Wesley. p. 236. ISBN 978-0-201-89684-8.
Sterbenz, Pat H. (1974). Floating-Point Computation. Englewood Cliffs, NJ, United States: Prentice-Hall. pp. 138–143. ISBN 0-13-322495-3.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[handbook-1] Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Cham, Switzerland: Birkhäuser. pp. 104–111. doi:10.1007/978-3-319-76526-6. ISBN 978-3-319-76525-9.

[moeller-2] Møller, Ole (March 1965). "Quasi double-precision in floating point addition". BIT Numerical Mathematics. 5: 37–50. doi:10.1007/BF01975722. S2CID 119991676.

[kahan-3] Kahan, W. (January 1965). "Further remarks on reducing truncation errors". Communications of the ACM. Association for Computing Machinery. 8 (1): 40. doi:10.1145/363707.363723. ISSN 0001-0782. S2CID 22584810.

[dekker-4] Dekker, T.J. (June 1971). "A floating-point technique for extending the available precision". Numerische Mathematik. 18 (3): 224–242. doi:10.1007/BF01397083. S2CID 63218464.

[shewchuck-5] Shewchuck, Jonathan Richard (October 1997). "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". Discrete & Computational Geometry. 18 (3): 305–363. doi:10.1007/PL00009321.

[knuth-taocp-vol2-6] Knuth, Donald E. (1998). The Art of Computer Programming, Volume II: Seminumerical Algorithms (3rd ed.). Addison–Wesley. p. 236. ISBN 978-0-201-89684-8.

[sterbenz-7] Sterbenz, Pat H. (1974). Floating-Point Computation. Englewood Cliffs, NJ, United States: Prentice-Hall. pp. 138–143. ISBN 0-13-322495-3.

2Sum

Algorithm

See also

References