flt2dec
)Expand description
Floating-point number to decimal conversion routines.
Problem statement
We are given the floating-point number v = f * 2^e
with an integer f
,
and its bounds minus
and plus
such that any number between v - minus
and
v + plus
will be rounded to v
. For the simplicity we assume that
this range is exclusive. Then we would like to get the unique decimal
representation V = 0.d[0..n-1] * 10^k
such that:
-
d[0]
is non-zero. -
It’s correctly rounded when parsed back:
v - minus < V < v + plus
. Furthermore it is shortest such one, i.e., there is no representation with less thann
digits that is correctly rounded. -
It’s closest to the original value:
abs(V - v) <= 10^(k-n) / 2
. Note that there might be two representations satisfying this uniqueness requirement, in which case some tie-breaking mechanism is used.
We will call this mode of operation as to the shortest mode. This mode is used
when there is no additional constraint, and can be thought as a “natural” mode
as it matches the ordinary intuition (it at least prints 0.1f32
as “0.1”).
We have two more modes of operation closely related to each other. In these modes
we are given either the number of significant digits n
or the last-digit
limitation limit
(which determines the actual n
), and we would like to get
the representation V = 0.d[0..n-1] * 10^k
such that:
-
d[0]
is non-zero, unlessn
was zero in which case onlyk
is returned. -
It’s closest to the original value:
abs(V - v) <= 10^(k-n) / 2
. Again, there might be some tie-breaking mechanism.
When limit
is given but not n
, we set n
such that k - n = limit
so that the last digit d[n-1]
is scaled by 10^(k-n) = 10^limit
.
If such n
is negative, we clip it to zero so that we will only get k
.
We are also limited by the supplied buffer. This limitation is used to print
the number up to given number of fractional digits without knowing
the correct k
beforehand.
We will call the mode of operation requiring n
as to the exact mode,
and one requiring limit
as to the fixed mode. The exact mode is a subset of
the fixed mode: the sufficiently large last-digit limitation will eventually fill
the supplied buffer and let the algorithm to return.
Implementation overview
It is easy to get the floating point printing correct but slow (Russ Cox has demonstrated how it’s easy), or incorrect but fast (naïve division and modulo). But it is surprisingly hard to print floating point numbers correctly and efficiently.
There are two classes of algorithms widely known to be correct.
-
The “Dragon” family of algorithm is first described by Guy L. Steele Jr. and Jon L. White. They rely on the fixed-size big integer for their correctness. A slight improvement was found later, which is posthumously described by Robert G. Burger and R. Kent Dybvig. David Gay’s
dtoa.c
routine is a popular implementation of this strategy. -
The “Grisu” family of algorithm is first described by Florian Loitsch. They use very cheap integer-only procedure to determine the close-to-correct representation which is at least guaranteed to be shortest. The variant, Grisu3, actively detects if the resulting representation is incorrect.
We implement both algorithms with necessary tweaks to suit our requirements.
In particular, published literatures are short of the actual implementation
difficulties like how to avoid arithmetic overflows. Each implementation,
available in strategy::dragon
and strategy::grisu
respectively,
extensively describes all necessary justifications and many proofs for them.
(It is still difficult to follow though. You have been warned.)
Both implementations expose two public functions:
-
format_shortest(decoded, buf)
, which always needs at leastMAX_SIG_DIGITS
digits of buffer. Implements the shortest mode. -
format_exact(decoded, buf, limit)
, which accepts as small as one digit of buffer. Implements exact and fixed modes.
They try to fill the u8
buffer with digits and returns the number of digits
written and the exponent k
. They are total for all finite f32
and f64
inputs (Grisu internally falls back to Dragon if necessary).
The rendered digits are formatted into the actual string form with four functions:
-
to_shortest_str
prints the shortest representation, which can be padded by zeroes to make at least given number of fractional digits. -
to_shortest_exp_str
prints the shortest representation, which can be padded by zeroes when its exponent is in the specified ranges, or can be printed in the exponential form such as1.23e45
. -
to_exact_exp_str
prints the exact representation with given number of digits in the exponential form. -
to_exact_fixed_str
prints the fixed representation with exactly given number of fractional digits.
They all return a slice of preallocated Part
array, which corresponds to
the individual part of strings: a fixed string, a part of rendered digits,
a number of zeroes or a small (u16
) number. The caller is expected to
provide a large enough buffer and Part
array, and to assemble the final
string from resulting Part
s itself.
All algorithms and formatting functions are accompanied by extensive tests
in coretests::num::flt2dec
module. It also shows how to use individual
functions.
Modules
- decoderExperimentalDecodes a floating-point value into individual parts and error ranges.
- estimatorExperimentalThe exponent estimator.
- strategyExperimentalDigit-generation algorithms.
Structs
- DecodedExperimentalDecoded unsigned finite value, such that:
Enums
- FullDecodedExperimentalDecoded unsigned value.
- SignExperimentalSign formatting options.
Constants
- MAX_SIG_DIGITSExperimentalThe minimum size of buffer necessary for the shortest mode.
Traits
- DecodableFloatExperimentalA floating point type which can be
decode
d.
Functions
- decodeExperimentalReturns a sign (true when negative) and
FullDecoded
value from given floating point number. - Returns the static byte string corresponding to the sign to be formatted. It can be either
""
,"+"
or"-"
. - Formats given decimal digits
0.<...buf...> * 10^exp
into the decimal form with at least given number of fractional digits. The result is stored to the supplied parts array and a slice of written parts is returned. - Formats the given decimal digits
0.<...buf...> * 10^exp
into the exponential form with at least the given number of significant digits. Whenupper
istrue
, the exponent will be prefixed byE
; otherwise that’se
. The result is stored to the supplied parts array and a slice of written parts is returned. - Returns a rather crude approximation (upper bound) for the maximum buffer size calculated from the given decoded exponent.
- round_upExperimentalWhen
d
contains decimal digits, increase the last digit and propagate carry. Returns a next digit when it causes the length to change. - to_exact_exp_strExperimentalFormats given floating point number into the exponential form with exactly given number of significant digits. The result is stored to the supplied parts array while utilizing given byte buffer as a scratch.
upper
is used to determine the case of the exponent prefix (e
orE
). The first part to be rendered is always aPart::Sign
(which can be an empty string if no sign is rendered). - to_exact_fixed_strExperimentalFormats given floating point number into the decimal form with exactly given number of fractional digits. The result is stored to the supplied parts array while utilizing given byte buffer as a scratch.
upper
is currently unused but left for the future decision to change the case of non-finite values, i.e.,inf
andnan
. The first part to be rendered is always aPart::Sign
(which can be an empty string if no sign is rendered). - to_shortest_exp_strExperimentalFormats the given floating point number into the decimal form or the exponential form, depending on the resulting exponent. The result is stored to the supplied parts array while utilizing given byte buffer as a scratch.
upper
is used to determine the case of non-finite values (inf
andnan
) or the case of the exponent prefix (e
orE
). The first part to be rendered is always aPart::Sign
(which can be an empty string if no sign is rendered). - to_shortest_strExperimentalFormats the given floating point number into the decimal form with at least given number of fractional digits. The result is stored to the supplied parts array while utilizing given byte buffer as a scratch.
upper
is currently unused but left for the future decision to change the case of non-finite values, i.e.,inf
andnan
. The first part to be rendered is always aPart::Sign
(which can be an empty string if no sign is rendered).