Summary: ParseNumberF64
, StringToDouble
and similarly named functions take
a string like "12.5"
(one two dot five) and return a 64-bit double-precision
floating point number like 12.5
(twelve point five). Some numbers (like
12.3
) aren’t exactly representable as an f64
but ParseNumberF64
still has
to return the best approximation. This blog post describes a simple algorithm
to do just that.
The previous blog post discussed the Eisel-Lemire ParseNumberF64
Algorithm, which is fast but not comprehensive and needs a
fallback ParseNumberF64
algorithm. This blog post discusses a fairly simple
algorithm, which I’ll call Simple Decimal Conversion (SDC). SDC is the fallback
algorithm used by both
Go
and
Wuffs.
It’s not the fastest or cleverest algorithm, but a rarely-invoked fallback
doesn’t have to be, and there is value in
simplicity.
SDC landed in Go’s standard library in 2008 (and faster algorithms landed later). After adjusting for the names and notation used in this blog post, a comment in that commit describes SDC as:
[½ .. 1]
(2 ** precision)
and round to get mantissaFor example, when parsing the numerical portion of the speed of light
"2.99792458e8"
, calculate that 29
is such that (2.99792458e8 / (2 **
29))
, which is 0.5584069676697254180908203125
exactly, is in [½ .. 1]
.
That fraction times (2 ** 53)
is 5029682823036928
which is
0x11DE78_4A000000
. The leading (53rd) bit is implicit and dropped for normal
f64
numbers. (1023 - 1 - -29)
is 1051
which is 0x41B
. Combining the two
fragments (and a 0
sign bit for non-negativeness) gives the f64
bit pattern
AsF64(0x41B1DE78_4A000000)
.
As in the previous blog post, let [I .. J]
and [I ..=
J]
denote half-open and closed ranges and let (X ** Y)
denote
exponentiation.
A leading zero like 012
is decimal, not octal. In this case, the number
twelve.
Let A ~NR<< B
and A ~NR>> B
denote Non-Rounding shifts (N-R shifts), i.e.
multiplying or dividing by powers of 2 without truncating to an integer. A
can be a fraction but B
must be a non-negative integer. For example, (31 >>
2)
is 7
but (31 ~NR>> 2)
is the same as (31 / 4)
, which is 7.75
.
Similarly, (0.1 ~NR<< 4)
is 1.6
. If A
is an integer and overflow doesn’t
occur then ~NR<<
is equivalent to a regular left-shift <<
.
Mainstream programming languages give us 64-bit unsigned integer types, roughly
20 decimal digits, but SDC might process longer strings like
"314159265358979323846264338327"
. Nonetheless, we can still perform N-R
shifts on these decimal numbers without needing a “big integer” math library or
even heap-allocated memory. Specifically, we can N-R shift by S
using an
(S+4)
-bit unsigned integer. With u64
types, we can N-R shift for S
up to
60 (inclusive).
We do this by rolling a sliding window across the input and output digit
streams. At each step, we track a mutable (S+4)
-bit accumulator as one digit
rolls in (‘expanding’, increasing the accumulator) and one digit rolls out
(‘contracting’, decreasing the accumulator). The +4
is because 4 bits are
needed to hold one of ten possible decimal digits.
We’ll hand-wave away the location of the decimal point for now, focusing on
just input and output digit streams. For example, if we know that 1234.0 / 4 =
0308.5
then we also know that:
.12340000 / 4 = .03085000
1.2340000 / 4 = 0.3085000
1.2340e56 / 4 = 0.3085e56
0.0012340 / 4 = 0.0003085
To simplify the following N-R shift examples, we’ll place the decimal point on the left of all the digits when right-shifting, and on the right of all the digits when left-shifting.
Here’s an example of N-R right-shifting .299792458
by 3
(i.e. dividing by
8
), using a (3+4)
-bit accumulator. One could imagine a Babbage-esque
Shifting Engine that input and output streams of digits:
In table form:
_CON'_10____IN_____EXP________EXP__________OUT___S___CON
((0 * 10) + 2) = 02 = 0b_0000_010 = ((0 << 3) + 2)
((2 * 10) + 9) = 29 = 0b_0011_101 = ((3 << 3) + 5)
((5 * 10) + 9) = 59 = 0b_0111_011 = ((7 << 3) + 3)
((3 * 10) + 7) = 37 = 0b_0100_101 = ((4 << 3) + 5)
((5 * 10) + 9) = 59 = 0b_0111_011 = ((7 << 3) + 3)
((3 * 10) + 2) = 32 = 0b_0100_000 = ((4 << 3) + 0)
((0 * 10) + 4) = 04 = 0b_0000_100 = ((0 << 3) + 4)
((4 * 10) + 5) = 45 = 0b_0101_101 = ((5 << 3) + 5)
((5 * 10) + 8) = 58 = 0b_0111_010 = ((7 << 3) + 2)
((2 * 10) + 0) = 20 = 0b_0010_100 = ((2 << 3) + 4)
((4 * 10) + 0) = 40 = 0b_0101_000 = ((5 << 3) + 0)
The first column "0,2,5,3,…,4"
is the same as the last column
"2,5,3,5,…,0"
, offset by one row - the previous row’s contracted accumulator
value (initialized to zero). The second column is the constant 10
. The third
column "2,9,9,7,…,0"
is the input digits, padded with trailing zeroes. The
fourth column "02,29,59,37,…,40"
is the expanded accumulator ((first * 10) +
third)
, which is then repeated (as the fifth column) in binary (where the
(S+4)
ness is more obvious) and again as ((sixth << S) + last)
. The sixth
column "0,3,7,4,…,5"
is the output digits (in the range [0 ..= 9]
). The
seventh column is the constant S
.
Thus, reading the third (IN) and sixth (OUT) columns top-to-bottom, .299792458
/ (2 ** 3) = .03747405725
.
Algorithmically, each row represents three steps:
acc = ((acc * 10) + nextInputDigit)
(acc >> S)
acc &= mask(S)
, where mask(S) = ((1 << S) - 1)
The rows can also be grouped into three periods: early, middle and late. Early
rows produce zeroes until the expanded accumulator ‘warms up’ to at least (1
<< S)
. Late rows consume implicit zeroes when the input is exhausted but the
expanded accumulator ‘cools down’ to zero. Middle rows are what happens in
between: a sufficient but not necessary condition of middle rows is that the
input and output digits are both non-zero. Implementations might write that
expand-output-contract loop three times, once for each period.
Most of the rows in the “Right-Shift Example #1”, above, is in the middle period. For other inputs, the early and late periods can actually touch or overlap, in which case the middle period is non-existent.
Here’s a longer example (eliding the fourth column) of N-R right-shifting the
same number .299792458
, but this time by 29
. The early and late periods are
more obvious (as the top-left and bottom-right triangles of zeroes in the fifth
0b_etc
column) and, in this particular example, the early and late periods
touch:
_____CON'_____10____IN__EXP___________________EXP_____________________OUT___S________CON___
((000000000 * 10) + 2) = … = 0b_0000_00000000000000000000000000010 = ((0 << 29) + 000000002)
((000000002 * 10) + 9) = … = 0b_0000_00000000000000000000000011101 = ((0 << 29) + 000000029)
((000000029 * 10) + 9) = … = 0b_0000_00000000000000000000100101011 = ((0 << 29) + 000000299)
((000000299 * 10) + 7) = … = 0b_0000_00000000000000000101110110101 = ((0 << 29) + 000002997)
((000002997 * 10) + 9) = … = 0b_0000_00000000000000111010100011011 = ((0 << 29) + 000029979)
((000029979 * 10) + 2) = … = 0b_0000_00000000001001001001100010000 = ((0 << 29) + 000299792)
((000299792 * 10) + 4) = … = 0b_0000_00000001011011011111010100100 = ((0 << 29) + 002997924)
((002997924 * 10) + 5) = … = 0b_0000_00001110010010111001001101101 = ((0 << 29) + 029979245)
((029979245 * 10) + 8) = … = 0b_0000_10001110111100111100001001010 = ((0 << 29) + 299792458)
((299792458 * 10) + 0) = … = 0b_0101_10010101100001011001011100100 = ((5 << 29) + 313570020)
((313570020 * 10) + 0) = … = 0b_0101_11010111001101111110011101000 = ((5 << 29) + 451345640)
((451345640 * 10) + 0) = … = 0b_1000_01101000001011110000100010000 = ((8 << 29) + 218489104)
((218489104 * 10) + 0) = … = 0b_0100_00010001110101100101010100000 = ((4 << 29) + 037407392)
((037407392 * 10) + 0) = … = 0b_0000_10110010010111110101001000000 = ((0 << 29) + 374073920)
((374073920 * 10) + 0) = … = 0b_0110_11110111101110010011010000000 = ((6 << 29) + 519513728)
((519513728 * 10) + 0) = … = 0b_1001_10101101001111000000100000000 = ((9 << 29) + 363299072)
((363299072 * 10) + 0) = … = 0b_0110_11000100010110000101000000000 = ((6 << 29) + 411765248)
((411765248 * 10) + 0) = … = 0b_0111_10101011011100110010000000000 = ((7 << 29) + 359556096)
((359556096 * 10) + 0) = … = 0b_0110_10110010011111110100000000000 = ((6 << 29) + 374335488)
((374335488 * 10) + 0) = … = 0b_0110_11111000111110001000000000000 = ((6 << 29) + 522129408)
((522129408 * 10) + 0) = … = 0b_1001_10111001101101010000000000000 = ((9 << 29) + 389455872)
((389455872 * 10) + 0) = … = 0b_0111_01000001000100100000000000000 = ((7 << 29) + 136462336)
((136462336 * 10) + 0) = … = 0b_0010_10001010101101000000000000000 = ((2 << 29) + 290881536)
((290881536 * 10) + 0) = … = 0b_0101_01101011000010000000000000000 = ((5 << 29) + 224460800)
((224460800 * 10) + 0) = … = 0b_0100_00101110010100000000000000000 = ((4 << 29) + 097124352)
((097124352 * 10) + 0) = … = 0b_0001_11001111001000000000000000000 = ((1 << 29) + 434372608)
((434372608 * 10) + 0) = … = 0b_1000_00010111010000000000000000000 = ((8 << 29) + 048758784)
((048758784 * 10) + 0) = … = 0b_0000_11101000100000000000000000000 = ((0 << 29) + 487587840)
((487587840 * 10) + 0) = … = 0b_1001_00010101000000000000000000000 = ((9 << 29) + 044040192)
((044040192 * 10) + 0) = … = 0b_0000_11010010000000000000000000000 = ((0 << 29) + 440401920)
((440401920 * 10) + 0) = … = 0b_1000_00110100000000000000000000000 = ((8 << 29) + 109051904)
((109051904 * 10) + 0) = … = 0b_0010_00001000000000000000000000000 = ((2 << 29) + 016777216)
((016777216 * 10) + 0) = … = 0b_0000_01010000000000000000000000000 = ((0 << 29) + 167772160)
((167772160 * 10) + 0) = … = 0b_0011_00100000000000000000000000000 = ((3 << 29) + 067108864)
((067108864 * 10) + 0) = … = 0b_0001_01000000000000000000000000000 = ((1 << 29) + 134217728)
((134217728 * 10) + 0) = … = 0b_0010_10000000000000000000000000000 = ((2 << 29) + 268435456)
((268435456 * 10) + 0) = … = 0b_0101_00000000000000000000000000000 = ((5 << 29) + 000000000)
Again, reading the third (IN) and sixth (OUT) columns top-to-bottom,
.299792458 / (2 ** 29) = .0000000005584069676697254180908203125
.
Hence, as mentioned above, (2.99792458e8 / (2 ** 29))
is
0.5584069676697254180908203125
exactly, which is in [½ .. 1]
.
Similarly, N-R left-shifting an arbitrarily long digit stream can be done with
an (S+4)
-bit accumulator, consuming and producing one digit at a time. The
computation is just the reverse of the N-R right-shift. Subtly, this means that
the digits are processed right-to-left instead of left-to-right.
Here’s a reprisal of Example #1 above. The first column is the input digits
(right-to-left), padded with leading zeroes. The second column is the constant
S
. The third column is the fifth column offset by one row - the previous
row’s contracted accumulator value (initialized to zero). The fourth column is
the (S+4)
-bit expanded accumulator variable. The sixth column is the constant
10
. The last column is the output digits (in the range [0 ..= 9]
).
__IN___S___CON'____EXP_____CON__10___OUT
((5 << 3) + 0) = 40 = ((4 * 10) + 0)
((2 << 3) + 4) = 20 = ((2 * 10) + 0)
((7 << 3) + 2) = 58 = ((5 * 10) + 8)
((5 << 3) + 5) = 45 = ((4 * 10) + 5)
((0 << 3) + 4) = 04 = ((0 * 10) + 4)
((4 << 3) + 0) = 32 = ((3 * 10) + 2)
((7 << 3) + 3) = 59 = ((5 * 10) + 9)
((4 << 3) + 5) = 37 = ((3 * 10) + 7)
((7 << 3) + 3) = 59 = ((5 * 10) + 9)
((3 << 3) + 5) = 29 = ((2 * 10) + 9)
((0 << 3) + 2) = 02 = ((0 * 10) + 2)
Reading the first (IN) and last (OUT) columns bottom-to-top, 03747405725. *
(2 ** 3) = 29979245800.
Note that the bottom-to-top EXP column here,
"02,29,59,…,58,20,40"
is the same as the Right-Shift Example #1 top-to-bottom
EXP column, and likewise for the CON/CON’ columns.
Again, each row represents three steps. The calculations are reversed compared to N-R right shifting:
acc = ((nextInputDigit << S) + acc)
(acc % 10)
acc /= 10
, rounding down (u64
division)Again, running top-to-bottom, rows can be grouped into early, middle and late periods, and implementations may specialize for each period.
Here’s the C/C++ data structure for the SDC algorithm’s High Precision Decimal (HPD) numbers. “High precision” means that the mantissa holds 800 decimal digits. The 800 magic number is arbitrary but sufficiently large in practice.
typedef struct {
uint32_t num_digits;
int32_t decimal_point;
bool negative;
bool truncated;
uint8_t digits[800];
} HPD;
For example, the number 7.89
would be an HPD with:
num_digits = 3
,decimal_point = +1
,negative = false
,truncated = false
anddigits
would be 7
, 8
and 9
. The remaining
elements of digits
would be undefined. In C/C++, HPD values are typically
stack-allocated and the digits
array does not need zero-initializing.The number 78900
would be the same HPD, except that decimal_point
would be
+5
instead of +1
. To elaborate, decimal_point
may be negative or be
larger than num_digits
, in which case the explicit digits
are padded with
implicit zeroes. For example:
decimal_point
of -2
means .00789
decimal_point
of -1
means .0789
decimal_point
of +0
means .789
decimal_point
of +1
means 7.89
decimal_point
of +2
means 78.9
decimal_point
of +3
means 789.
decimal_point
of +4
means 7890.
decimal_point
of +5
means 78900.
If num_digits
is zero then the HPD value represents the number zero.
Otherwise, the canonical form has both digits[0]
and digits[num_digits-1]
not equal to zero. For example, representing the number 7.89
as {num_digits
= 4; decimal_point = +2; digits = {0, 7, 8, 9}}
is non-canonical, as is
{num_digits = 4; decimal_point = +1; digits = {7, 8, 9, 0}}
.
In canonical form (and positive num_digits
), a decimal_point
higher than
+2047 means that the overall value is infinity, lower than -2047 means zero.
Again, the magic numbers here are arbitrary but sufficiently large in practice.
negative
is a sign bit. An HPD can distinguish positive and negative zero.
The number -7.890000___a_thousand_zeroes___001
would be {num_digits = 800;
decimal_point = +1; negative = true; truncated = true}
and the digits
array
being 7
, 8
, 9
and then 797
zeroes. truncated
would be false if the
final digit was a 0
instead of a 1
. In the parlance of the previous blog
post, the truncated
boolean is there to distinguish
between “a half exactly” and “a half and a little bit more”.
We can perform Non-Rounding left- and right-shifts of HPD numbers per the “High Precision Non-Rounding Shifts” section above, with a few tweaks.
First, HPD numbers have a finite number (800) of explicit digits (and we don’t
want to dynamically allocate memory). If a shift would lay down a digit beyond
that, we simply set the truncated
boolean to true if it was a non-zero digit.
Second, the overall algorithm is simpler if shifting can maintain the invariant
that an HPD value is in canonical form. Specifically, the first num_digits
elements of its digits
array should have no leading or trailing zeroes.
Trimming trailing zeroes is trivial and cheap: just decrement num_digits
by
how many there are. Trimming leading zeroes is also trivial for right-shifting:
don’t bump the pointer (or increment the index) until we’ve emitted a non-zero
digit. Emitting what would otherwise be a leading zero instead only adjusts an
HPD’s decimal_point
field without affecting the digits
elements or
num_digits
.
Handling left-shifts in general is more difficult, for two reasons. One is
leading zeroes and two is working right-to-left. A naive approach would first
write to an intermediate buffer and then memcpy
the canonical digits from the
intermediate buffer back to the HPD digits
array.
A cleverer solution is credited to Ken
Thompson,
a Turing Awardee. The easy insight is that if you knew M
, the number of
additional leading non-zero digits that a left-shift introduces, then you can
skip the intermediate buffer and write straight to an HPD’s digits
array. The
hard insight is figuring out that, given the shift S
, M
is one of two
consecutive integers. Which one it is depends on a lexicographic comparison
to powers of 5, due to the happy coincidence that decimal digits work in base
10, shifts work in base 2 and the ratio of those bases is exactly 5.
Lexicographic means that "5678"
compares less than "625"
because of their
leading digits: "5"
compares less than "6"
.
For example, shifting by 4 (i.e. multiplying by 16) can add 1 or 2 new
digits, depending on a lexicographic comparison to (5 ** 4) = 625
:
(1 << 4) = 16
, which adds 1 new digit.(5678 << 4) = 90848
, which adds 1 new digit.(624 << 4) = 9984
, which adds 1 new digit.(62498 << 4) = 999968
, which adds 1 new digit.(625 << 4) = 10000
, which adds 2 new digits.(625001 << 4) = 10000016
, which adds 2 new digits.(7008 << 4) = 112128
, which adds 2 new digits.(99 << 4) = 1584
, which adds 2 new digits.Calculating M
based on S
involves a look-up table (for S == 4
, the table
entries are “max 2 new digits” and "625"
). The lexicographic comparison then
determines whether M
is max
or (max - 1)
.
As described at the top, SDC involves parsing the input string to fill out an HPD data structure. If it represents the number zero than we’re done.
Otherwise, repeatedly do Non-Rounding right-shifts (for a shift S
no more
than 60) until the HPD represents a number less than 1 (i.e. its
decimal_point
field is non-positive). Repeatedly do Non-Rounding left-shifts
(again for S
no more than 60) until the HPD represents a number in the range
[½ .. 1]
(i.e. its decimal_point
field is zero and its digits[0]
is at
least 5
). Getting to [½ .. 1]
is slightly easier, computationally, than
getting to the [1 .. 2]
range that a normal f64
bit-pattern represents, and
we’re usually about to multiply by a power of two anyway.
Update on 2023-02-04: actually, it’s just as easy, and faster, to target the
[1 .. 2]
range. See the Wuffs PR #87
discussion and commits
d46220ca
and
4d4c3c4c.
Shifting by (S == 1)
on each repetition is simple, but it’s faster (fewer
repetitions) to take larger shifts based on decimal_point
, provided that each
shift still obeys (S <= 60)
. The right-shifts can overshoot a little, because
the subsequent left-shifts will correct that, but the left-shifts start with
the HPD value below 1 and must keep it that way.
We use a small look-up table such that the I
th entry (starting at I = 0
) is
the largest power of 2 less than (10 ** I)
. For example, the fifth entry is
13 because ((2 ** 13) = 8192 < 10000 = (10 ** 4))
. If our HPD’s
decimal_point
is -4
(so that its value is less than 0.0001
) then
Non-Rounding left-shifting it by 13 will keep it less than 0.8192
and
therefore less than 1
). This small look-up table only has 19 entries, as ((2
** 59) < (10 ** 18))
and anything beyond that hits the (S <= 60)
constraint.
const uint8_t powers[19] = {
0, 3, 6, 9, 13, 16, 19, 23, 26, 29,
33, 36, 39, 43, 46, 49, 53, 56, 59,
};
For example, starting with the numerical portion of Planck’s constant
"6.62607015e-34"
, which is {decimal_point = -33; digits = {6, 6, 2, 6, 0, 7,
0, 1, 5}}
:
33
is outside the length of the powers
array, so left-shifting by 60
gives {decimal_point = -15; digits = {7, 6, 3, 9, 3, etc, 6, 6, 4}}
, which
is 7.6393387669685162332913664e-16
.15
is inside the length of the powers
array, so left-shifting by 49
gives {decimal_point = 0; digits = {4, 3, 0, 0, 5, etc, 1, 6, 8}}
, which is
4.3005654030345492606001512614662313607168e-1
.decimal_point
is 0 but the leading digit is less than 5
. We don’t use the
powers
look-up table here. Instead, left-shifting by 1
gives
{decimal_point = 0; digits = {8, 6, 0, 1, 1, etc, 3, 3, 6}}
, which is
8.6011308060690985212003025229324627214336e-1
. This is in the range [½ ..
1]
so we stop.This took a total of 0
right-shifts and (60 + 49 + 1 = 110)
left-shifts.
There’s one more implicit left-shift to get from [½ .. 1]
to [1 .. 2]
, for
a net base-2 exponent of -111
. Adding the f64
base-2 exponent bias of
1023
gives 912
, which is 0x390
.
If this exponent was too large then the parsed f64
is infinite. If this
exponent was too small then we would N-R right-shift the mantissa (and adjust
the biased base-2 exponent) until the exponent was in range (positive). This
might end with the mantissa below ½, so that the parsed f64
value is zero,
but otherwise we’d return a subnormal f64
. This example is neither, so:
53
gives {decimal_point = 16; digits = {7, 7, 4, 7, 2,
etc, 3, 1, 2}}
, which is
7747209898635537.19908586215205528506839523031474149261312
with 16
digits
to the left of the decimal point. Rounding to the nearest integer (rounding
ties to even) gives 7747209898635537
which is 0x001B860B_DE023111
.Combining the 52 mantissa bits 0xB860B_DE023111
with the 11 exponent bits
0x390
and the 1 sign bit 0x0
gives the f64
bit pattern
AsF64(0x390B860B_DE023111)
.
This is the same as for The Eisel-Lemire ParseNumberF64 Algorithm blog post.
Source code is available as C, C++ or Go.
Russ Cox’ “Floating Point to Decimal Conversion is
Easy” blog post from 2011 is about the
reverse conversion (ftoa
instead of atof
) but his introduction and
conclusion applies equally well here:
Floating point to decimal conversions have a reputation for being difficult. At heart, they’re really very simple and straightforward… Just remember: The conversion is easy. The optimizations are hard.
This blog post covers an easy, ‘unoptimized’ algorithm. The previous blog post
covers the Eisel-Lemire algorithm, first published in
2020, which is state-of-the-art fast but still relatively simple. Both
algorithms are easier to follow than e.g. the GNU C Library’s
strtod_l
implementation. The bottom of Cox’s blog
post has links to a number of other
approaches, including David Gay’s 1990 classic “Correctly Rounded
Binary-Decimal and Decimal-Binary
Conversions”.
Published: 2020-11-02