Module core::num::flt2dec

source ·
🔬This is a nightly-only experimental API. (flt2dec)
Expand description

Floating-point number to decimal conversion routines.

Problem statement

We are given the floating-point number v = f * 2^e with an integer f, and its bounds minus and plus such that any number between v - minus and v + plus will be rounded to v. For the simplicity we assume that this range is exclusive. Then we would like to get the unique decimal representation V = 0.d[0..n-1] * 10^k such that:

  • d[0] is non-zero.

  • It’s correctly rounded when parsed back: v - minus < V < v + plus. Furthermore it is shortest such one, i.e., there is no representation with less than n digits that is correctly rounded.

  • It’s closest to the original value: abs(V - v) <= 10^(k-n) / 2. Note that there might be two representations satisfying this uniqueness requirement, in which case some tie-breaking mechanism is used.

We will call this mode of operation as to the shortest mode. This mode is used when there is no additional constraint, and can be thought as a “natural” mode as it matches the ordinary intuition (it at least prints 0.1f32 as “0.1”).

We have two more modes of operation closely related to each other. In these modes we are given either the number of significant digits n or the last-digit limitation limit (which determines the actual n), and we would like to get the representation V = 0.d[0..n-1] * 10^k such that:

  • d[0] is non-zero, unless n was zero in which case only k is returned.

  • It’s closest to the original value: abs(V - v) <= 10^(k-n) / 2. Again, there might be some tie-breaking mechanism.

When limit is given but not n, we set n such that k - n = limit so that the last digit d[n-1] is scaled by 10^(k-n) = 10^limit. If such n is negative, we clip it to zero so that we will only get k. We are also limited by the supplied buffer. This limitation is used to print the number up to given number of fractional digits without knowing the correct k beforehand.

We will call the mode of operation requiring n as to the exact mode, and one requiring limit as to the fixed mode. The exact mode is a subset of the fixed mode: the sufficiently large last-digit limitation will eventually fill the supplied buffer and let the algorithm to return.

Implementation overview

It is easy to get the floating point printing correct but slow (Russ Cox has demonstrated how it’s easy), or incorrect but fast (naïve division and modulo). But it is surprisingly hard to print floating point numbers correctly and efficiently.

There are two classes of algorithms widely known to be correct.

  • The “Dragon” family of algorithm is first described by Guy L. Steele Jr. and Jon L. White. They rely on the fixed-size big integer for their correctness. A slight improvement was found later, which is posthumously described by Robert G. Burger and R. Kent Dybvig. David Gay’s dtoa.c routine is a popular implementation of this strategy.

  • The “Grisu” family of algorithm is first described by Florian Loitsch. They use very cheap integer-only procedure to determine the close-to-correct representation which is at least guaranteed to be shortest. The variant, Grisu3, actively detects if the resulting representation is incorrect.

We implement both algorithms with necessary tweaks to suit our requirements. In particular, published literatures are short of the actual implementation difficulties like how to avoid arithmetic overflows. Each implementation, available in strategy::dragon and strategy::grisu respectively, extensively describes all necessary justifications and many proofs for them. (It is still difficult to follow though. You have been warned.)

Both implementations expose two public functions:

  • format_shortest(decoded, buf), which always needs at least MAX_SIG_DIGITS digits of buffer. Implements the shortest mode.

  • format_exact(decoded, buf, limit), which accepts as small as one digit of buffer. Implements exact and fixed modes.

They try to fill the u8 buffer with digits and returns the number of digits written and the exponent k. They are total for all finite f32 and f64 inputs (Grisu internally falls back to Dragon if necessary).

The rendered digits are formatted into the actual string form with four functions:

  • to_shortest_str prints the shortest representation, which can be padded by zeroes to make at least given number of fractional digits.

  • to_shortest_exp_str prints the shortest representation, which can be padded by zeroes when its exponent is in the specified ranges, or can be printed in the exponential form such as 1.23e45.

  • to_exact_exp_str prints the exact representation with given number of digits in the exponential form.

  • to_exact_fixed_str prints the fixed representation with exactly given number of fractional digits.

They all return a slice of preallocated Part array, which corresponds to the individual part of strings: a fixed string, a part of rendered digits, a number of zeroes or a small (u16) number. The caller is expected to provide a large enough buffer and Part array, and to assemble the final string from resulting Parts itself.

All algorithms and formatting functions are accompanied by extensive tests in coretests::num::flt2dec module. It also shows how to use individual functions.

Modules

  • decoderExperimental
    Decodes a floating-point value into individual parts and error ranges.
  • estimatorExperimental
    The exponent estimator.
  • strategyExperimental
    Digit-generation algorithms.

Structs

  • DecodedExperimental
    Decoded unsigned finite value, such that:

Enums

  • FullDecodedExperimental
    Decoded unsigned value.
  • SignExperimental
    Sign formatting options.

Constants

  • MAX_SIG_DIGITSExperimental
    The minimum size of buffer necessary for the shortest mode.

Traits

Functions

  • decodeExperimental
    Returns a sign (true when negative) and FullDecoded value from given floating point number.
  • determine_sign 🔒 Experimental
    Returns the static byte string corresponding to the sign to be formatted. It can be either "", "+" or "-".
  • digits_to_dec_str 🔒 Experimental
    Formats given decimal digits 0.<...buf...> * 10^exp into the decimal form with at least given number of fractional digits. The result is stored to the supplied parts array and a slice of written parts is returned.
  • digits_to_exp_str 🔒 Experimental
    Formats the given decimal digits 0.<...buf...> * 10^exp into the exponential form with at least the given number of significant digits. When upper is true, the exponent will be prefixed by E; otherwise that’s e. The result is stored to the supplied parts array and a slice of written parts is returned.
  • estimate_max_buf_len 🔒 Experimental
    Returns a rather crude approximation (upper bound) for the maximum buffer size calculated from the given decoded exponent.
  • round_upExperimental
    When d contains decimal digits, increase the last digit and propagate carry. Returns a next digit when it causes the length to change.
  • to_exact_exp_strExperimental
    Formats given floating point number into the exponential form with exactly given number of significant digits. The result is stored to the supplied parts array while utilizing given byte buffer as a scratch. upper is used to determine the case of the exponent prefix (e or E). The first part to be rendered is always a Part::Sign (which can be an empty string if no sign is rendered).
  • to_exact_fixed_strExperimental
    Formats given floating point number into the decimal form with exactly given number of fractional digits. The result is stored to the supplied parts array while utilizing given byte buffer as a scratch. upper is currently unused but left for the future decision to change the case of non-finite values, i.e., inf and nan. The first part to be rendered is always a Part::Sign (which can be an empty string if no sign is rendered).
  • Formats the given floating point number into the decimal form or the exponential form, depending on the resulting exponent. The result is stored to the supplied parts array while utilizing given byte buffer as a scratch. upper is used to determine the case of non-finite values (inf and nan) or the case of the exponent prefix (e or E). The first part to be rendered is always a Part::Sign (which can be an empty string if no sign is rendered).
  • to_shortest_strExperimental
    Formats the given floating point number into the decimal form with at least given number of fractional digits. The result is stored to the supplied parts array while utilizing given byte buffer as a scratch. upper is currently unused but left for the future decision to change the case of non-finite values, i.e., inf and nan. The first part to be rendered is always a Part::Sign (which can be an empty string if no sign is rendered).