Summary: ParseNumberF64
, StringToDouble
and similarly named functions take
a string like "12.5"
(one two dot five) and return a 64-bit double-precision
floating point number like 12.5
(twelve point five). Some numbers (like
12.3
) aren’t exactly representable as an f64
but ParseNumberF64
still has
to return the best approximation. In March 2020, Daniel Lemire
published
some source code for a new,
fast algorithm to do this, based on an original idea by Michael Eisel. Here’s
how it works.
First, a caveat. The Eisel-Lemire algorithm is very fast (Lemire’s blog
post
contains impressive benchmark numbers, e.g. 9 times faster than the C standard
library’s strtod
) but it isn’t comprehensive. There are a small proportion of
strings that are valid numbers but it cannot parse, where Eisel-Lemire will
fail over to a fallback ParseNumberF64
implementation.
The primary goal is speed, for 99+% of the cases, not 100% coverage. As long as Eisel-Lemire doesn’t claim false positives, the combined approach is both fast and correct. Update on 2020-10-08: To be clear, combining with the fallback means that Eisel-Lemire-with-the-fallback is (much) faster than just-the-fallback, for ‘only’ 99+% of the cases, but still correct for 100% of the cases, including subnormal numbers, infinities and all the rest.
If falling back to strtod
, know that it can be sensitive to
locale-related environment
variables (i.e. whether twelve point five is "12.5"
or "12,5"
). Discussing
fallback algorithms any further is out of scope for this blog post. Update on
2020-11-02: the Simple Decimal Conversion fallback algorithm is discussed in
the next blog post.
Let [I .. J]
denote the half-open range of numbers simultaneously greater
than or equal to I
and less than J
. The lower bound is inclusive but the
upper bound is exclusive.
Let [I ..= J]
denote a closed range, where the upper bound is now inclusive
and its constraint is now “less than or equal to”.
Let (X ** Y)
denote X
raised to the Y
th power. For example, here are some
different ways to write “one thousand”:
1e3
1000
10 ** 3
Similarly, here are some different ways to write “sixty four”:
0x40
64
2 ** 6
1 << 6
Exponents can be negative. (10 ** -1)
is one tenth and (2 ** -3)
is one
eighth.
Let A ~MOD+ B
denote modular addition, where the modulus is usually clear
from the context. For example, working with u8
values would use a modulus of
256
. (100 + 200)
would normally be 300
, which overflows a u8
, but (100
~MOD+ 200)
would be 44
without overflow. In C/C++, for unsigned integer
types, the "~MOD+"
operator is simply spelled "+"
.
In C/C++, this type is called double
. Go calls it float64
. Rust calls it
f64
. We’ll use f64
in this blog post, as well as u64
for 64-bit unsigned
integers and i32
for 32-bit signed integers.
Wikipedia’s double-precision floating
point
article has a lot of detail. More briefly, a 64-bit value (e.g.
0x40840000_00000000
) is split into:
0x0
, meaning non-negative0x408 - 1023 = 9
0x40000_00000000
is implicitly 0x140000_00000000
whose 53 bits, in
binary, is 0b10100_00000000_00000000_00000000_00000000_00000000_00000000
,
interpreted as ((1*1) + (0*½) + (1*¼) + (0*⅛) + etc) = 1.25
Let AsF64(0x40840000_00000000)
denote reinterpreting those 64 bits as an
f64
bit pattern. Its value is therefore (1.25 * (2 ** 9))
, which is 640
in decimal. An equivalent derivation starts with 0x00140000_00000000 =
5629499534213120
and then (5629499534213120 >> (52-9)) = 640
.
Similarly, AsF64(0x43400000_00000000)
and AsF64(0x43400000_00000001)
are
9007199254740992
and 9007199254740994
in decimal, also known as ((1<<53) +
0)
and ((1<<53) + 2)
. The integer in between, 9007199254740993 = ((1<<53) +
1)
, is not exactly representable as an f64
. Relatedly, the slightly smaller
9007199254740991 = ((1<<53) - 1)
is also known in JavaScript as
Number.MAX_SAFE_INTEGER
.
The sign bit (corresponding to a leading "-"
minus sign in the string form)
is trivial to parse and we won’t spend any more time discussing it.
Non-normal numbers include subnormal numbers (with a biased exponent of
0x000
) and non-finite numbers (with a biased exponent of 0x7FF
and whose
value is either infinite or Not-a-Number). We similarly won’t spend much time
on these.
Typically, when rounding a decimal fraction to an integer, 7.3
rounds down to
7
and 7.6
rounds up to 8
. Rounding numbers like 7.5
, half-way between
two integers, is subject to more debate. One option is rounding to
even, alternating between rounding
down and up:
70.5
rounds to 70
, rounding down71.5
rounds to 72
, rounding up72.5
rounds to 72
, rounding down73.5
rounds to 74
, rounding up74.5
rounds to 74
, rounding down75.5
rounds to 76
, rounding upProperly parsing f64
values similarly rounds to even: the evenness of the
least significant bit of the 53-bit mantissa. This isn’t necessarily the same
as rounding the overall value to an integer:
9007199254740990
is exactly representable as AsF64(0x433FFFFF_FFFFFFFE)
9007199254740991
is exactly representable as AsF64(0x433FFFFF_FFFFFFFF)
9007199254740992
is exactly representable as AsF64(0x43400000_00000000)
9007199254740993
rounds to 9007199254740992 = AsF64(0x43400000_00000000)
,
rounding down9007199254740994
is exactly representable as AsF64(0x43400000_00000001)
9007199254740995
rounds to 9007199254740996 = AsF64(0x43400000_00000002)
,
rounding up9007199254740996
is exactly representable as AsF64(0x43400000_00000002)
9007199254740997
rounds to 9007199254740996 = AsF64(0x43400000_00000002)
,
rounding down9007199254740998
is exactly representable as AsF64(0x43400000_00000003)
9007199254740999
rounds to 9007199254741000 = AsF64(0x43400000_00000004)
,
rounding up9007199254741000
is exactly representable as AsF64(0x43400000_00000004)
9007199254741001
rounds to 9007199254741000 = AsF64(0x43400000_00000004)
,
rounding down9007199254741002
is exactly representable as AsF64(0x43400000_00000005)
9007199254741003
rounds to 9007199254741004 = AsF64(0x43400000_00000006)
,
rounding upFor clarity, this blog post presents the Eisel-Lemire algorithm in Static
Single Assignment
form. For example, a separate AdjE2_1
variable is defined below, based on
AdjE2_0
, instead of destructively modifying a single AdjE2
variable over
time. Implementations are obviously free to use a more traditional imperative
programming style.
u64
ValuesSome compilers (and some instruction
sets) provide a built-in u128
representation for multiplying two u64
values without overflow. When they
don’t, it’s relatively
straightfoward
to implement with u64
operations:
u64
is split into a high and low 32 bits.u64
).u64
values are re-assembled into a u128
.The smallest and largest positive, finite f64
values, DBL_TRUE_MIN
and
DBL_MAX
, are approximately 4.94e-324
and 1.80e+308
. We’ll pre-compute two
approximations, called the narrow (low resolution) and wide (high
resolution) approximations, to each power-of-10 in a range, such as from
1e-325
to 1e+308
inclusive. Implementations can choose a smaller
range, discussed in
the Exp10
Range section below.
For each base-10 exponent E10
, the narrow approximation to (10 ** E10)
is
the unique pair of a u64
-typed mantissa M64
and an i32
-typed base-2
exponent E2
such that:
M64
is set: M64 >= 0x80000000_00000000
.(10 ** E10) >= ((M64 + 0) * (2 ** E2))
(10 ** E10) < ((M64 + 1) * (2 ** E2))
The >=
in the first condition is ==
when the approximation is exact. When
inexact, the approximation rounds down (truncates). Whether exact or not, the
residual R64
, defined as:
(10 ** E10) = ((M64 + R64) * (2 ** E2))
or, equivalently,R64 = ((10 ** E10) / (2 ** E2)) - M64
implies that R64
is in the range [0 .. 1]
.
Here’s an exact example, for 1e3
, also known as (250 * 4)
or (0xFA << 2)
:
1e3 = (0xFA000000_00000000 * (2 ** -54))
Here’s an inexact example, for 1e43
:
1e43 ≈ (0xE596B7B0_C643C719 * (2 ** 79))
Specifically, these two numbers bracket 1e43
:
(0xE596B7B0_C643C719 << 79) = 9999999999999999999741184793924429452148736
(0xE596B7B0_C643C71A << 79) = 10000000000000000000345647703731744039501824
The (0xE596B7B0_C643C719, 79)
pair represents an inclusive-lower
exclusive-upper bound range for 1e43
.
The narrow powers-of-10 look-up table has two columns for each E10
row: M64
and NarrowBiasedE2
. The NarrowBiasedE2
value is E2
plus a NarrowBias
constant (the magical number 1150
) which is discussed later.
The wide approximation is like the narrow one except its mantissa M128
is 128
bits instead of 64.
1e43 = (0xE596B7B0_C643C719_6D9CCD05_D0000000 * (2 ** 15))
The wide powers-of-10 look-up table splits M128
’s bits in half to have three
columns: M128Lo
, M128Hi
and WideBiasedE2
. For the 1e43
example, these
are 0x6D9CCD05_D0000000
, 0xE596B7B0_C643C719
and (WideBias + 15)
. The
last two columns are shared with the narrow look-up table (M64 = M128Hi
and
WideBias = NarrowBias + 64 = 1214
) so that a single table holds both the
narrow and wide approximations.
The powers-of-10 look-up table is generated by this Go
program.
The two u64
columns are explicitly printed, and the one i32
column is
implied by a linear expression with slope log(10)/log(2)
.
Man:Exp10
FormParsing starts by converting the string to an integer mantissa and base-10 exponent. For example:
"1.23e45"
becomes (123 * (10 ** 43))
"67800.0"
becomes (678 * (10 ** 2))
"3.14159"
becomes (314159 * (10 ** -5))
If the mantissa is zero, then the parsed f64
is trivially zero.
If the mantissa is non-zero but less than (1 << 53)
then it is still exactly
representable as an f64
. Likewise, the first 23 powers of 10, from 1e0
to
1e22
, are also exactly representable.
Thus, parsing "67800.0"
can be done by simply converting (in the C++
static_cast<double>
sense) the u64 678
to an f64 678
and then multiplying
by 1e2
. Converting "3.14159"
is similar, except dividing instead of
multiplying by 1e5
. Such cases don’t need to run Eisel-Lemire or the fallback
algorithm.
This Go snippet agrees:
smallPowersOf10 := [23]float64{
1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7,
1e8, 1e9, 1e10, 1e11, 1e12, 1e13, 1e14, 1e15,
1e16, 1e17, 1e18, 1e19, 1e20, 1e21, 1e22,
}
u := uint64(314159)
f := float64(u)
e := smallPowersOf10[5]
fmt.Printf("0x%016X\n", math.Float64bits(f/e))
fmt.Printf("0x%016X\n", math.Float64bits(3.14159))
// Output:
// 0x400921F9F01B866E
// 0x400921F9F01B866E
Man
RangeAs mentioned earlier, the Eisel-Lemire algorithm is not comprehensive. For
example, the fallback applies if the mantissa part of the Man:Exp10
form
overflows a u64
. In practice, it’s easier to check the looser condition that
Man
has at most 19 decimal digits (and is non-zero):
(1 << 63) = 9223372036854775808
, which has 19 decimal digits(1 << 64) = 18446744073709551616
, which has 20 decimal digits9999999999999999999 = 0x8AC72304_89E7FFFF
, which has 64 binary
and 16 hexadecimal digits99999999999999999999 = 0x5_6BC75E2D_630FFFFF
, which has 67 binary
and 17 hexadecimal digitsExp10
RangeSimilarly, the fallback applies when Exp10
is outside a certain range. In
Lemire’s original
code,
the range is [-325 ..= 308]
. The Wuffs library
implementation
uses a smaller range, [-307 ..= 288]
because it leads to a smaller look-up
table. More importantly, combining the smaller range with the Man
range of
[1 ..= UINT64_MAX]
, approximately [1 ..= 1.85e+19]
, means that (Man * (10
** Exp10))
is in the range [1e-307 ..= 1.85e+307]
. This is entirely within
the range of normal (neither subnormal nor non-finite) f64
values: DBL_MIN
and DBL_MAX
are approximately 2.23e–308
and 1.80e+308
. Note that the
awkwardly named (but C++ standard) DBL_MIN
constant is larger than
DBL_TRUE_MIN
.
Continuing with the parsing "1.23e45"
example, let TV
denote the true
numerical value 1.23e45
(not just the closest f64
value).
With the equivalent Man:Exp10
form: 123e43
, the Exp10
part indexes the
look-up table. For 1e43
, recall that M64
is 0xE596B7B0_C643C719
and
NarrowBiasedE2
is (NarrowBias + 79)
:
1e43 ≈ (0xE596B7B0_C643C719 * (2 ** 79))
The next step is to normalize the u64 123
value so that its high bit is set.
In hexadecimal, 0x00000000_0000007B
has 57 leading zero bits (CLZ
or Count
Leading Zeroes is a common
bit-manipulation function that typically has hardware and compiler support).
The zero mantissa case was handled above, so the non-zero mantissa here has a
well-defined CLZ
. Shifting Man
left by CLZ(Man)
gives a normalized mantissa,
NorMan
, whose high bit is set. We’ll also track AdjE2_0
, and adjusted base-2
exponent, based on the look-up table’s NarrowBiasedE2
and this shift:
NorMan = (Man << CLZ(Man)) = (0x7B << 57) = 0xF6000000_00000000
AdjE2_0 = (NarrowBiasedE2 - CLZ(Man)) = ((1150 + 79) - 57) = 1172
We won’t need it just yet, but as we’re defining the CLZ(arg)
function to
return the count of leading zeroes, let’s also define the LSB(arg)
and
MSB(arg)
functions to return the Least and Most Significant Bits. For a u64
arg
, LSB(arg) = (arg & 1)
and MSB(arg) = (arg >> 63)
.
The essential idea is that, after converting the input string to the normalized
Man:Exp10
form, we combine 64 bits of input mantissa with 64 bits of Exp10
mantissa to produce more than enough for the 53 bits of f64
mantissa. The
f64
base-2 exponent is basically AdjE2_0
with one or two more tweaks,
described below.
The 64+64 intermediate mantissa bits will need to be properly rounded to
produce the right 53 f64
mantissa bits. Furthermore, the look-up table
doesn’t always give the exact value of (10 ** Exp10)
, only a range. Still, we
are often able to produce a conclusive rounding when every number in that
range would round to the same 53 bits.
An analogy is rounding a number to the nearest integer when only knowing the
first three decimal digits (a lower bound). If you know that a number is in the
range [10.234 .. 10.235]
then you know that the nearest integer is 10
, even
if you don’t know exactly what the number is. Similarly, anything in the range
[3.999 .. 4.000]
certainly rounds to 4
. Subtly, rounding anything in the
range [8.499 .. 8.500]
also certainly rounds to 8
because the upper bound
of a ..
range is exclusive. However, rounding a number in the range [8.500
.. 8.501]
is ambiguous. “Eight and a half exactly” rounds down (per round to
even) but “eight and a half and a little bit more” rounds up. When the
Eisel-Lemire algorithm encounters an ambiguous case, it simply fails over to
the fallback algorithm.
“Knowing the first three decimal digits” means that the size of a range like
[10.234 .. 10.235]
is 0.001
. Let’s call that an example of a “1-unit
range”, for an appropriate definition of “unit”. A “2-unit range”, like
[10.234 .. 10.236]
still round unambiguously, as does [3.999 .. 4.001]
and
[8.498 .. 8.500]
. The patterns to look out for are [8.499 .. 8.501]
and
[8.500 .. 8.502]
.
More on this later, but to recap, for decimal digits:
499
case means that rounding is ambiguous for a 2-unit range, but
unambiguous for a 1-unit range.500
case means that rounding is ambiguous for both 1-unit and 2-unit
ranges, unless we know that a later digit is non-zero (so that we’re “a half
and a little bit more”) or that the integer part is odd (so that “a half
exactly” would still round up). With more precision, a lower bound of
8.500000
is still ambiguous but a lower bound of 8.500012
is not.
Alternatively, a lower bound of 9.500
is also not ambiguous: both 9.5000
exactly and 9.5001
round to even to 10
.NorMan
and M64
are both u64
values whose high bits are set, so
multiplying them together produces a u128
value W
that has only 0 or 1
leading zero bits. Split W
into high and low 64-bit halves, WHi
and WLo
,
and WHi
likewise has only 0 or 1 leading zero bits.
A small-scale analogy is that multiplying (without overflow) two u8
values
both in the range [0x80 ..= 0xFF]
produces a u16
value in the range
[0x4000 ..= 0xFE01]
, so its high 8 bits are a u8
in the range [0x40 ..=
0xFE]
. If that u8
’s high bit (the 0x80
bit) is 0 then its second-highest
bit (the 0x40
bit) must be 1.
Anyway, we already knew:
NorMan = 0xF6000000_00000000
M64 = 0xE596B7B0_C643C719
Therefore:
W = NorMan * M64 = 0xDC9ED483_DE852152_06000000_00000000
WHi = 0xDC9ED483_DE852152
WLo = 0x06000000_00000000
MSB(WHi) = (WHi >> 63) = 1
CLZ(WHi) = (1 - MSB(WHi)) = 0
When scaled by an appropriate power-of-2 (i.e. for an appropriately defined
“unit”), [WHi .. (WHi + 1)]
is therefore a 1-unit range that contains the
scaled (NorMan * M64)
.
Recall that while Man
is exact and NorMan = (Man * (2 ** CLZ(Man)))
is
exact, M64
and E2
form an approximation. The difference between the
power-of-10 approximation (M64 * (2 ** E2))
and the true power-of-10 (10 **
M10)
is (R64 * (2 ** E2))
. Therefore the difference between:
(NorMan * M64 * (2 ** (E2 - CLZ(Man))))
andTV = (Man * (10 ** M10))
isET = (NorMan * R64 * (2 ** (E2 - CLZ(Man))))
.Focusing just on the (NorMan * R64)
part of ET
, NorMan
is a u64
and
therefore less than (2 ** 64)
at W
scale, so it must be less than 1
at
WHi
scale. R64
is less than 1
. Therefore, ET
is less than (1 * 1)
at
WHi
scale. Combining that 1-unit for ET
with the range at the top of this
section gives that [WHi .. (WHi + 2)]
is therefore a 2-unit range that
contains the scaled true value TV
.
As discussed in the “Rounding Ranges” section above, a 2-unit range is good
enough to work with unless we’re in the base-2 equivalent of the 499
case. As
we’ll see in the following sections, we’re about to shift right by 9 or 10 bits
and then again by 1 more bit, so the base-2 equivalent of 499
is that the low
10 bits are 0x1FF
or the low 11 bits are 0x3FF
. Recall that the primary
goal is speed, not perfect coverage. A fast and simple check for both is that
((WHi & 0x1FF) == 0x1FF)
.
Even if that condition holds, we can still proceed if we can narrow the 2-unit
range to a 1-unit range. First, the error term ET
is (NorMan * R64)
at W
scale, which is less than NorMan
at the same W
scale. Equivalently, (W +
NorMan)
is an upper bound for TV
. If (WLo + NorMan)
does not overflow a
u64
then the high 64 bits of that upper bound are the same as WHi
and we
have a 1-unit range, starting at WHi
, at WHi
scale. The test for overflow
is that ((WLo ~MOD+ NorMan) < NorMan)
.
Thus, if either ((WHi & 0x1FF) != 0x1FF)
or ((WLo ~MOD+ NorMan) >= NorMan)
are true, and that’s the case for the "1.23e45"
example, then we can simply
rename W
to X
and skip the rest of this section.
X = W = 0xDC9ED483_DE852152_06000000_00000000
Otherwise, we might have a 499
case but all is not yet lost. We can refine
our approximation to TV
. Before, we used (NorMan * M64)
, a 128-bit value,
based on the narrow approximation to (10 ** E10)
. This time, we would use
(NorMan * M128)
, a 192-bit value, based on the wide approximation.
The 192-bit computation is largely straightfoward but uninteresting and we
won’t dwell on it for this blog post. At the end of it, we set X
to its high
128 bits and there’s another “fail over to the fallback” check, similar to the
two-part check a few paragraphs above that started with ((WHi & 0x1FF) !=
0x1FF)
, but it has three parts instead of two, because 192 is three times 64.
Let XHi
and XLo
be X
’s high and low 64 bits. Recall that CLZ(X)
is
either 0 or 1, so that CLZ(XHi)
must also be either 0 or 1 and that, either
way, (CLZ(XHi) + MSB(XHi) == 1)
.
XHi = 0xDC9ED483_DE852152
MSB(XHi) = (XHi >> 63) = 1
CLZ(XHi) = (1 - MSB(XHi)) = 0
Now, we know that XHi
has either 0 or 1 leading zero bits. Shifting X
right
by (9 + MSB(XHi))
therefore results in a u64
that has exactly 10 leading 0
bits and then a 1 bit: a 54-bit number. We also tweak AdjE2_0
(and for this
example, tweak it by zero) to produce AdjE2_1
:
X54 = (XHi >> (9 + MSB(XHi))) = 0x003727B5_20F7A148
AdjE2_1 = (AdjE2_0 - (1 - MSB(XHi))) = 1172 = 0x494
We now detect the equivalent of the 500
ambiguity, discussed in the “Rounding
Ranges” section above. Like the 499
case, a necessary condition is that the
low 10 bits of XHi
are 0x200
or the low 11 bits are 0x400
. Again, a
slightly faster-looser check for both is that ((XHi & 0x1FF) == 0x000)
and
that (LSB(X54) == 1)
. Another necessary condition is that XLo
is all
zeroes, otherwise we’d have “a half and a little bit more”. Finally, with round
to even, ambiguity requires that the equivalent of the ‘integer part’ be even,
which is that (LSB(X54 >> 1) == 0)
.
That multiple-part condition can be re-arranged to be (XLo == 0)
and ((XHi &
0x1FF) == 0)
and ((X54 & 3) == 1)
. If all three are true, Eisel-Lemire fails
over to the fallback.
Otherwise, rounding to 53 bits (exactly what we need for an f64
’s 52-bit
mantissa with an explicit 53rd bit that’s 1) just depends on LSB(X54)
: 0
means to round down and 1
means to round up.
This simply involves adding X54
’s low bit to itself and then right shifting
by 1:
X53 = ((X54 + (X54 & 1)) >> 1) = 0x001B93DA_907BD0A4
Note that (X54 + 1)
can overflow 54 bits. It does not, in this case, but if
it did (i.e. if (X53 >> 53)
was 1
instead of 0
), shift and add by 1 more:
Overflow = (X53 >> 53) = 0
RetMan = ((X53 >> Overflow) & 0xFFFFF_FFFFFFFF) = 0xB93DA_907BD0A4
RetExp = (AdjE2_1 + Overflow) = 0x494
The RetExp
value started with the magical NarrowBiasE2
constant. That
magical number 1150 (which is 1023 + 127) was chosen so that RetExp
here is
exactly the 11-bit f64
base-2 exponent, including its 1023 bias.
In Lemire’s original code, there is one final fail-over check
that RetExp
is in the range [0x001 .. 0x7FF]
. Too small and we’re
encroaching on subnormal f64
space. Too large and we’re encroaching on
nonfinite f64
space. The Wuffs library implementation is
tighter, as discussed in the Exp10
Range section above, so it can skip the
check here. This trades off speeding up the common cases for slowing down the
rare cases.
Packing the 52 bits of RetMan
with 11 bits of RetExp
(left shifted by 52)
produces our final f64
return value:
"1.23e45"
produces an f64
whose bits are 0x494B93DA_907BD0A4
This Go snippet agrees that AsF64(0x494B93DA_907BD0A4)
is the closest
approximation to 1.23e45
(and the Go compiler, as of version 1.15 released in
August 2020, does not use the Eisel-Lemire algorithm):
fmt.Printf("0x%016X\n", math.Float64bits(1.23e45))
const m = 0x1B93DA_907BD0A4
const e = 0x494
fmt.Printf("%46v\n", big.NewInt(0).Lsh(big.NewInt(1), e-1023-52))
fmt.Printf("%v\n", big.NewInt(0).Lsh(big.NewInt(m-1), e-1023-52))
fmt.Printf("%v\n", big.NewInt(0).Lsh(big.NewInt(m+0), e-1023-52))
fmt.Printf("%v\n", big.NewInt(0).Lsh(big.NewInt(m+1), e-1023-52))
// Output:
// 0x494B93DA907BD0A4
// 158456325028528675187087900672
// 1229999999999999815358543982490949384520335360
// 1229999999999999973814869011019624571608236032
// 1230000000000000132271194039548299758696136704
Update on 2020-11-02: link to a richer test suite.
The
nigeltao/parse-number-fxx-test-data
repository contains many test cases, one per line, that look like:
3C00 3F800000 3FF0000000000000 1
3D00 3FA00000 3FF4000000000000 1.25
3D9A 3FB33333 3FF6666666666666 1.4
57B7 42F6E979 405EDD2F1A9FBE77 123.456
622A 44454000 4088A80000000000 789
7C00 7F800000 7FF0000000000000 123.456e789
Parsing the fourth column (the string form) should produce the third column
(the 64 bits of the f64
form). In this snippet, the final line’s f64
representation is infinity. As before, DBL_MAX
is approximately 1.80e+308
.
The test cases (in string form) were found by running the equivalent of
/usr/bin/strings
and /bin/grep
over various source code repositories like
google/double-conversion
and
ulfjack/ryu
. The f64
form was then
calculated using Go’s
strconv.ParseFloat
function.
Hooking up a test program to that data set verifies that, on my computer:
strtod
strconv.ParseFloat
all agree on the f64
form for over several million unique test cases.
Everything but C’s strtod
should also be locale-independent.
This blog post is much, much longer than the actual source code. The core
function is about 80 lines of C/C++ code, excluding comments and the
powers-of-10 table.
lemire/fast_double_parser
is the original C++ implementation.
google/wuffs
has a C re-implementation. There are also
Julia
and
Rust
re-implementations.
Update on 2021-02-21: if you just want to see the code, Go 1.16 (released
February 2021) has a 70 line
implementation
(plus another 70 lines for float32
vs float64
, plus 700 lines for the
powers-of-10 table).
Published: 2020-10-07