fastnumbers
Super-fast and clean conversions to numbers.
- Source Code: https://github.com/SethMMorton/fastnumbers
- Downloads: https://pypi.org/project/fastnumbers/
- Documentation: https://fastnumbers.readthedocs.io/
- Quick Start
- Timing
- High-level Algorithm
- How To Run Tests
- History
fastnumbers
is a module with the following three objectives (in order
of decreasing importance as to why the module was created):
- Provide a set of convenience functions that wrap calls to
int
andfloat
and provides easy, concise, powerful, fast and flexible error handling.- Provide a set of functions that can be used to rapidly identify if an input could be converted to int or float.
- Provide drop-in replacements for the Python built-in
int
andfloat
that are on par or faster with the Python equivalents (see the Timing section for details). These functions should behave identically to the Python built-ins except for a few specific corner-cases as mentioned in the API documentation for those functions.
- PLEASE read the quick start for these functions to fully understand the caveats before using them.
What kind of speedups can you expect? Here are some highlights, but please see the Timing section for the raw data if you want details.
- Up to 2x faster conversion of strings to integers than the built-in
int()
function- Up to 5x faster conversion of strings to floats than the built-in
float()
function (possibly greater for very long strings)- Up to 10x faster handling of errors during conversion than using user-side error handling
- On top of the above, operations to convert a list of strings (with the
map
option ortry_array
function) is 2x faster than the equivalent list comprehension.
NOTICE: As of fastnumbers
version 4.0.0, only Python >= 3.7 is
supported.
NOTICE: As of fastnumbers
version 4.0.0, the functions fast_real
,
fast_float
, fast_int
, fast_forceint
, isreal
, isfloat
,
isint
, and isintlike
have been deprecated and are replaced with
try_real
, try_float
, try_int
, try_forceint
, check_real
,
check_float
, check_int
, and check_intlike
, respectively. These
new functions have more flexible APIs and have names that better reflect
the intent of the functions. The old functions can still be used (they will
never be removed from fastnumbers
), but the new ones should be
preferred for new development.
NOTICE: As of fastnumbers
version 4.0.0, query_type
now sets
allow_underscores
to False
by default instead of True
.
Quick Start
There are three broad categories of functions exposed by fastnumbers
.
The below quick start will demonstrate each of these categories. The
quick start is "by example", and will show a sample interactive session
using the fastnumbers
API.
Error-Handling Functions
- Error-handling function API
- Fast operations on lists and other iterables
- About the on_fail option
- About the denoise option
try_float
will be used to demonstrate the functionality of the
try_*
functions.
>>> from fastnumbers import RAISE, try_float
>>> # Convert string to a float
>>> try_float('56.07')
56.07
>>> # Integers are converted to floats
>>> try_float(54)
54.0
>>>
>>> # Unconvertable string returned as-is by default
>>> try_float('bad input')
'bad input'
>>> # Unconvertable strings can trigger a default value
>>> try_float('bad input', on_fail=0)
0
>>>
>>> # One can ask inf or nan to be substituted with another value
>>> try_float('nan')
nan
>>> try_float('nan', nan=0.0)
0.0
>>> try_float(float('nan'), nan=0.0)
0.0
>>> try_float('56.07', nan=0.0)
56.07
>>>
>>> # The default built-in float behavior can be triggered with
>>> # RAISE given to "on_fail".
>>> try_float('bad input', on_fail=RAISE) #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError: invalid literal for float(): bad input
>>>
>>> # A function can be used to return an alternate value for invalid input
>>> try_float('bad input', on_fail=len)
9
>>> try_float(54, on_fail=len)
54.0
>>>
>>> # Single unicode characters can be converted.
>>> try_float('\u2164') # Roman numeral 5 (V)
5.0
>>> try_float('\u2466') # 7 enclosed in a circle
7.0
try_int
behaves the same as try_float
, but for integers.
>>> from fastnumbers import try_int
>>> try_int('1234')
1234
>>> try_int('\u2466')
7
try_real
is like try_float
or try_int
depending
on if there is any fractional component of thi return value.
>>> from fastnumbers import try_real
>>> try_real('56')
56
>>> try_real('56.0')
56
>>> try_real('56.0', coerce=False)
56.0
>>> try_real('56.07')
56.07
>>> try_real(56.07)
56.07
>>> try_real(56.0)
56
>>> try_real(56.0, coerce=False)
56.0
try_forceint
always returns an integer.
>>> from fastnumbers import try_forceint
>>> try_forceint('56')
56
>>> try_forceint('56.0')
56
>>> try_forceint('56.07')
56
>>> try_forceint(56.07)
56
Fast operations on lists and other iterables
Each of the try_*
functions have a map
option causes the function
to accept an iterable of items to convert and returns a list. Using
try_float
as an example, the following are all functionally equivalent.
>>> from fastnumbers import try_float
>>> iterable = ["5", "4.5", "34567.6", "32"]
>>> try_float(iterable, map=list) == list(map(try_float, iterable))
True
>>> try_float(iterable, map=list) == [try_float(x) for x in iterable]
True
>>> try_float(iterable, map=list) == list(try_float(iterable, map=True))
True
The difference is that the map
option is 2x the speed of the list
comprehension method, and 1.5x the speed of the map
method. The reason
is that it avoids Python function call overhead on each iteration. Note that
True causes the function to return an iterator, and list causes it to
return a list
. In practice the performance of these are similar
(see Timing for raw data).
If you need to store your output in a numpy
array, you can use
try_array
to do this conversion directly. This function has some
additional handling for overflow that is not present in the other
fastnumbers
functions that may come in handy when dealing with
numpy
arrays.
>>> from fastnumbers import try_array
>>> import numpy as np
>>> iterable = ["5", "4.5", "34567.6", "32"]
>>> np.array_equal(np.array(try_float(iterable, map=list), dtype=np.float64), try_array(iterable))
True
You will see about a 2x speedup of doing this in one step over converting to a list then converting that list to an array.
on_fail
option
About the The on_fail
option is a way for you to do anything in the event that
the given input cannot be converted to a number. It can
- return given object as-is if set to
fastnumbers.INPUT
(this is the default) - raise a
ValueError
if set tofastnumbers.RAISE
- return a default value if given any non-callable object
- call a function with the given object if given a single-argument callable
Below are a couple of ideas to get you thinking.
NOTE:: There is also an on_type_error
option that behaves the same as
on_fail
except that a) it is triggered when the given object is of an
invalid type and b) the default value is fastnumbers.RAISE
, not
fastnumbers.INPUT
.
>>> from fastnumbers import INPUT, RAISE, try_float
>>> # You want to convert strings that can be converted to numbers, but
>>> # leave the rest as strings. Use fastnumbers.INPUT (the default)
>>> try_float('45.6')
45.6
>>> try_float('invalid input')
'invalid input'
>>> try_float('invalid input', on_fail=INPUT)
'invalid input'
>>>
>>>
>>>
>>> # You want to convert any invalid string to NaN
>>> try_float('45.6', on_fail=float('nan'))
45.6
>>> try_float('invalid input', on_fail=float('nan'))
nan
>>>
>>>
>>>
>>> # Simple callable case, send the input through some function to generate a number.
>>> try_float('invalid input', on_fail=lambda x: float(x.count('i'))) # count the 'i's
3.0
>>>
>>>
>>>
>>> # Suppose we know that our input could either be a number, or if not
>>> # then we know we just have to strip off parens to get to the number
>>> # e.g. the input could be '45' or '(45)'. Also, suppose that if it
>>> # still cannot be converted to a number we want to raise an exception.
>>> def strip_parens_and_try_again(x):
... return try_float(x.strip('()'), on_fail=RAISE)
...
>>> try_float('45', on_fail=strip_parens_and_try_again)
45.0
>>> try_float('(45)', on_fail=strip_parens_and_try_again)
45.0
>>> try_float('invalid input', on_fail=strip_parens_and_try_again) #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError: invalid literal for float(): invalid input
>>>
>>>
>>>
>>> # Suppose that whenever an invalid input is given, it needs to be
>>> # logged and then a default value is returned.
>>> def log_and_default(x, log_method=print, default=0.0):
... log_method("The input {!r} is not valid!".format(x))
... return default
...
>>> try_float('45', on_fail=log_and_default)
45.0
>>> try_float('invalid input', on_fail=log_and_default)
The input 'invalid input' is not valid!
0.0
>>> try_float('invalid input', on_fail=lambda x: log_and_default(x, default=float('nan')))
The input 'invalid input' is not valid!
nan
denoise
option
About the The denoise
option is available on the try_real
and try_forceint
options.
To best understand its usage, consider the following native Python behavior:
>>> int(3.453e21)
3452999999999999737856
>>> int(float("3.453e21"))
3452999999999999737856
>>> # Most users would likely expect this result from decimal.Decimal
>>> import decimal
>>> int(decimal.Decimal("3.453e21"))
3453000000000000000000
>>> # But watch out, even decimal.Decimal doesn't help for float input
>>> import decimal
>>> int(decimal.Decimal(3.453e21))
3452999999999999737856
Because the conversion of a float to an int goes through the C double
data type which
has inherent limitations on accuracy (See
this Stack Overflow question for examples)
the resulting int
result has "noise" digits that are not part of the original float
representation.
For functions where this makes sense, fastnumbers
provides the denoise
option to
give you the results that decimal.Decimal
would give for strings containing floats.
>>> from fastnumbers import try_real
>>> try_real(3.453e21)
3452999999999999737856
>>> try_real("3.453e21")
3452999999999999737856
>>> try_real(3.453e21, denoise=True)
3453000000000000000000
>>> try_real("3.453e21", denoise=True)
3453000000000000000000
Two things to keep in mind:
- The
denoise
option adds additional overhead to the conversion calculation, so please consider the trade-offs between speed and accuracy when determining whether or not to use it. It is significantly faster than usingdecimal.Decimal
, but much slower than not using it at all. - For string input,
denoise
will return results identical todecimal.Decimal
. For float input,denoise
will return results that are accurate to about 15 digits (Cdouble
can only store 16 decimal digits, so this means that only the last possible digit may not be accurate).
Checking Functions
check_float
will be used to demonstrate the functionality of the
check_*
functions. There is also the query_type
function.
>>> from fastnumbers import check_float
>>> from fastnumbers import ALLOWED, DISALLOWED, NUMBER_ONLY, STRING_ONLY
>>> # Check that a string can be converted to a float
>>> check_float('56')
True
>>> check_float('56', strict=True)
False
>>> check_float('56.07')
True
>>> check_float('56.07 lb')
False
>>>
>>> # Check if a given number is a float
>>> check_float(56.07)
True
>>> check_float(56)
False
>>>
>>> # Specify if only strings or only numbers are allowed
>>> check_float(56.07, consider=STRING_ONLY)
False
>>> check_float('56.07', consider=NUMBER_ONLY)
False
>>>
>>> # Customize handling for nan or inf (see API for more details)
>>> check_float('nan')
False
>>> check_float('nan', nan=ALLOWED)
True
>>> check_float(float('nan'))
True
>>> check_float(float('nan'), nan=DISALLOWED)
False
check_int
works the same as check_float
, but for integers.
>>> from fastnumbers import check_int
>>> check_int('56')
True
>>> check_int(56)
True
>>> check_int('56.0')
False
>>> check_int(56.0)
False
check_real
is very permissive - any float or integer is accepted.
>>> from fastnumbers import check_real
>>> check_real('56.0')
True
>>> check_real('56')
True
>>> check_real(56.0)
True
>>> check_real(56)
True
check_intlike
checks if a number is "int-like", if it has no
fractional component.
>>> from fastnumbers import check_intlike
>>> check_intlike('56.0')
True
>>> check_intlike('56.7')
False
>>> check_intlike(56.0)
True
>>> check_intlike(56.7)
False
The query_type
function can be used if you need to determine if
a value is one of many types, rather than whether or not it is one specific
type.
>>> from fastnumbers import query_type
>>> query_type('56.0')
<class 'float'>
>>> query_type('56')
<class 'int'>
>>> query_type(56.0)
<class 'float'>
>>> query_type(56)
<class 'int'>
>>> query_type(56.0, coerce=True)
<class 'int'>
>>> query_type('56.0', allowed_types=(float, int))
<class 'float'>
>>> query_type('hey')
<class 'str'>
>>> query_type('hey', allowed_types=(float, int)) # returns None
Drop-in Replacement Functions
PLEASE do not take it for granted that these functions will provide you with a speedup - they may not. Every platform, compiler, and data-set is different, and you should perform a timing test on your system with your data to evaluate if you will see a benefit. As you can see from the data linked in the Timing section, the amount of speedup you will get is particularly data-dependent. In general you will see a performance boost for floats (and this boost increases as the size of the float increases), but for integers it is largely dependent on the length of the integer. You will likely not see a performance boost if the input are already numbers instead of strings.
NOTE: in the below examples, we use from fastnumbers import int
instead
of import fastnumbers
. This is because calling fastnumbers.int()
is a
bit slower than just int()
because Python has to first find fastnumbers
in your namespace, then find int
in the fastnumbers
namespace, instead
of just finding int
in your namespace - this will slow down the function
call and defeat the purpose of using fastnumbers
. If you do not want to
actually shadow the built-in int
function, you can do
from fastnumbers import int as fn_int
or something like that.
>>> # Use is identical to the built-in functions
>>> from fastnumbers import float, int
>>> float('10')
10.0
>>> int('10')
10
>>> float('bad input') #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError: invalid literal for float(): bad input
real
is provided to give a float or int depending
on the fractional component of the input.
>>> from fastnumbers import real
>>> real('56.0')
56
>>> real('56.7')
56.7
>>> real('56.0', coerce=False)
56.0
Timing
Just how much faster is fastnumbers
than a pure python implementation?
Please look https://github.com/SethMMorton/fastnumbers/tree/main/profiling.
High-Level Algorithm
For integers, CPython goes to great lengths to ensure that your string input
is converted to a number correctly and losslessly (you can prove this to
yourself by examining the source code for
integer conversions).
This extra effort is only needed for integers that cannot fit into a 64-bit
integer data type - for those that can, a naive algorithm of < 10 lines
of C code is sufficient and significantly faster. fastnumbers
uses a
heuristic to determine if the input can be safely converted with the much
faster naive algorithm, and if so it does so, falling back on
the CPython implementation for longer input strings.
Most real-world numbers pass the heuristic and so you should generally see
improved performance with fastnumbers
for integers.
For floats, fastnumbers
utilizes the ultra-fast
fast_float::from_chars function
to convert strings representing floats into a C double
both quickly and
safely - the conversion provides the same accuracy as the CPython
float conversion function
but instead of scaling linearly with length of the input string it seems
to have roughly constant performance. By completely bypassing the CPython
converter we get significant performance gains with no penalty, so you
should always see improved performance with fastnumbers
for floats.
Installation
Use pip
!
$ pip install fastnumbers
How to Run Tests
Please note that fastnumbers
is NOT set-up to support
python setup.py test
.
The recommended way to run tests is with tox. Suppose you want to run tests for Python 3.8 - you can run tests by simply executing the following:
$ tox run -e py38
tox
will create virtual a virtual environment for your tests and install
all the needed testing requirements for you.
If you want to run testing on all supported Python versions you can simply execute
$ tox run
You can change the how much "random" input your tests will try with
# Run fewer tests with "random" input - much faster
$ tox run -- --hypothesis-profile fast
# Run more tests with "random" input - takes much longer but is more thorough
$ tox run -- --hypothesis-profile thorough
If you want to run the performce analysis yourself, you can execute
# This assumes Python 3.9 - adjust for the version you want to profile
$ tox run -e py39-prof
If you do not wish to use tox
, you can install the testing dependencies with the
dev-requirements.txt
file and then run the tests manually using
pytest.
$ pip install -r dev/requirements.txt
$ pytest
Author
Seth M. Morton
History
Please visit the changelog on GitHub.