rangeless::fn
range
-free LINQ-like library of higher-order functions for manipulation of containers and lazy input-sequences.
Documentation
What it's for
- Reduce the amount of mutable state.
- Flatten control-flow.
- Lessen the need to deal with iterators directly.
- Make the code more expressive and composeable.
This library is intended for moderate to advanced-level c++ programmers that like the idea of c++ ranges
, but can't or choose not to use them for various reasons (e.g. high complexity, compilation overhead, debug-build performance, size of the library, etc).
Motivations:
- https://www.fluentcpp.com/2019/09/13/the-surprising-limitations-of-c-ranges-beyond-trivial-use-cases/
- https://www.fluentcpp.com/2019/02/12/the-terrible-problem-of-incrementing-a-smart-iterator/
- https://brevzin.github.io/c++/2020/07/06/split-view/
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2011r1.html
- https://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/
Features
- Portable c++11. (examples are c++14)
- Single-header.
- Minimal standard library dependencies.
- No inheritance, polymorphism, type-erasures, ADL, advanced metaprogramming, enable_ifs, concepts, preprocessor magic, arcane programming techniques (for some definition of arcane), or compiler-specific workarounds.
- Low
#include
and compile-time overhead. - Enables trivial parallelization (see
fn::to_async and fn::transform_in_parallel
). - Allows for trivial extension of functionality (see
fn::adapt
).
Simple examples
struct employee_t
{
int id;
std::string last_name;
std::string first_name;
int years_onboard;
bool operator<(const employee_t& other) const
{
return id < other.id;
}
};
namespace fn = rangeless::fn;
using fn::operators::operator%; // arg % fn equivalent to fn(std::forward<Arg>(arg))
using fn::operators::operator%=; // arg %= fn; equivalent to arg = fn( std::move(arg));
// Abbreviated lambda macro, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0573r2.html
// (It is not defined in this library, because nobody likes macros except those we define ourselves).
#define L(expr) ([&](auto&& _ ){ return expr; })
auto employees = std::vector<employee_t>{/*...*/};
employees %= fn::where L( _.last_name != "Doe" );
employees %= fn::take_top_n_by(10, L( _.years_onboard ));
employees %= fn::sort_by L( std::tie( _.last_name, _.first_name) );
// or:
employees = std::move(employees)
% fn::where L( _.last_name != "Doe" )
% fn::take_top_n_by(10, L( _.years_onboard ))
% fn::sort_by L( std::tie( _.last_name, _.first_name) );
How does this work? E.g. fn::sort_by(projection_fn)
is a higher-order function that returns a unary function that takes inputs by value (normally passed as rvalue), sorts them by the user-provided projection, and returns them by value (non-copying). Similarly, fn::where(predicate)
filters the input to those satisfying the predicate.
operator %
, pronounced "then", invoked as arg % unary_function
, is syntax-sugar, similar to F#'s operator |>
, that enables structuring your code in top-down manner, consistent with the direction of the data-flow, similar to UNIX pipes. It is implemented as:
template<typename Arg, typename F>
auto operator % (Arg&& arg, F&& fn) -> decltype( std::forward<F>(fn)( std::forward<Arg>(arg)) )
{
return std::forward<F>(fn)( std::forward<Arg>(arg));
}
Without using operator%
the above example would look as follows:
employees = fn::where L( _.last_name != "Doe" )( std::move(employees) );
employees = fn::take_top_n_by(10, L( _.years_onboard ))( std::move(employees) );
employees = fn::sort_by L( std::tie( _.last_name, _.first_name) )( std::move(employees) );
// or, as single nested function call:
employees = fn::sort_by L( std::tie( _.last_name, _.first_name))(
fn::take_top_n_by(10, L( _.years_onboard ))(
fn::where L( _.last_name != "Doe" )(
std::move(employees) )));
Example: Top-5 most frequent words chosen among the words of the same length.
auto my_isalnum = L( std::isalnum(_) || _ == '_' );
fn::from( std::istreambuf_iterator<char>(istr.rdbuf()), {})
% fn::transform L( char(std::tolower(_)) )
% fn::group_adjacent_by(my_isalnum) // returns sequence-of-std::string
% fn::where L( my_isalnum( _.front())) // discard strings with punctuation
% fn::counts() // returns map<string,size_t> of word->count
% fn::group_all_by L( _.first.size()) // returns [[(word, count)]], each subvector containing words of same length
% fn::transform( // transform each sub-vector...
fn::take_top_n_by(5UL, L( _.second)) // by filtering it taking top-5 by count.
% fn::concat() // undo group_all_by (flatten)
% fn::for_each L( void(std::cout << _.first << "\t" << _.second << "\n") );
// compilation time:
// >>time g++ -I ../include/ -std=c++14 -o test.o -c test.cpp
// real 0m1.176s
// user 0m1.051s
// sys 0m0.097s
More examples
- A rudimentary lazy TSV parser.
- calendar.cpp vs. Haskell vs. range-v3 implementation.
- aln_filter.cpp for more advanced examples of use.
Description
Unlike range-v3
, this library is centered around value-semantics rather than reference-semantics. This library does not know or deal with the multitude of range-related concepts; rather, it deals with data transformations via higher-order functions. It differentiates between two types of inputs: a Container
and a lazy seq<NullaryInvokable>
satisfying single-pass forward-only InputRange
semantics (also known as a data-stream). Most of the function-objects in this library have two overloads of operator()
respectively. Rather than composing views over ranges as with range-v3
, operator()
s take inputs by value, operate on it eagerly or compose a lazy seq
, as appropriate (following the Principle of Least Astonishment), and return the result by value (with move-semantics) to the next stage.
E.g.
fn::where
- given a container, passed by rvalue, returns the same container filtered to elements satisfying the predicate, using the erase-remove or iterate-erase idioms under the hood, as appropriate.
- given a container, passed by lvalue-reference, returns a copy of the container with elements satisfying the predicate, using
std::copy_if
under the hood. - given a
seq
, passed by value, composes and returns aseq
that will skip the elements not satisfying the predicate (lazy).
fn::sort
- given a container, passed by value, returns the sorted container.
- given a
seq
, passed by value, moves elements into astd::vector
, and delegates to the above.
fn::transform
- given a
seq
, passed by value, returns aseq
wrapping a composition of the transform-function over the underlyingNullaryInvokable
that will lazily yield the results of transform-function. - given a container, passed by value, wraps it as
seq
and delegates to the above (i.e. also lazy).
- given a
Some functions in this library internally buffer elements, as appropriate, with single-pass streaming inputs, whereas range-v3
, on the other hand, imposes multipass ForwardRange or stronger requirement on the inputs in situations that would otherwise require buffering. This makes this library conceptually more similar to UNIX pipelines with eager sort
and lazy sed
, than to c++ ranges.
Operations | Buffering behavior | Laziness |
---|---|---|
fn::group_adjacent_by , fn::in_groups_of |
buffer elements of the incoming group | lazy |
fn::unique_all_by |
buffer unique keys of elements seen so far | lazy |
fn::drop_last , fn::sliding_window |
buffer a queue of last n elements |
lazy |
fn::transform_in_parallel |
buffer a queue of n executing async-tasks |
lazy |
fn::group_all_by , fn::sort_by , fn::lazy_sort_by , fn::reverse , fn::to_vector |
buffer all elements | eager |
fn::take_last |
buffer a queue of last n elements |
eager |
fn::where_max_by , fn::where_min_by |
buffer maximal/minimal elements as seen so-far | eager |
fn::take_top_n_by |
buffer top n elements as seen so-far |
eager |
end-of-sequence
from a generator-function
Signaling More often than not a generator-function that yields a sequence of values will not be an infinite Fibonacci sequence, but rather some bounded sequence of objects, either from a file, a socket, a database query, etc, so we need to be able to signal end-of-sequence. One way to do it is to yield elements wrapped in std::unique_ptr
or std::optional
:
fn::seq([]() -> std::unique_ptr<...> { ... })
% fn::take_while([](const auto& x) { return bool(x); })
% fn::transform(fn::get::dereferenced{})
% ...
If your value-type has an "empty" state interpretable as end-of-inputs, you can use the value-type directly without wrapping.
If you don't care about incurring an exception-handling overhead once per whole seq, there's a simpler way of doing it: just return fn::end_seq()
from the generator function (e.g. see my_intersperse example). This throws end-of-sequence exception that is caught under the hood (python-style). If you are in -fno-exceptions
land, then this method is not for you.
Summary of different ways of passing inputs
fn::seq([]{ ... }) % ... // as input-range from a nullary invokable
std::move(vec) % ... // pass a container by-move
vec % ... // pass by-copy
fn::from(vec) % ... // as move-view yielding elements by-move (std::move will make copies iff vec is const)
fn::cfrom(vec) % ... // as above, except always take as const-reference / yield by copy
fn::refs(vec) % ... // as seq taking vec by reference and yielding reference-wrappers
fn::from(it_beg, it_end) % ... // as a move-view into range (std::move will make copies iff const_iterator)
fn::from(beg_end_pair) % ... // as above, as std::pair of iterators
Note: fn::from
can also be used to adapt an lvalue-reference to an Iterable
that implements
begin()
and end()
as free-functions rather than methods.
Primer on using projections
Grouping/sorting/uniqing/where_max_by/etc. functions take a projection function rather than a binary comparator as in std::
algorithms.
// Sort by employee_t::operator<.
employees %= fn::sort(); // same as fn::sort_by( fn::by::identity{} );
// Sort by a projection involving multiple fields (first by last_name, then by first_name):
employees %= fn::sort_by L( std::make_pair( _.last_name, _.first_name ));
// The above may be inefficient (makes copies); prefer returning as tuple of references:
employees %= fn::sort_by L( std::tie( _.last_name, _.first_name ));
// If need to create a mixed tuple capturing lvalues as references and rvalues as values:
employees %= fn::sort_by L( std::make_tuple( _.last_name.size(), std::ref( _.last_name ), std::ref( _.first_name )));
// Equivalently, using fn::tie_lvals(...) that captures lvalues as references like std::tie, and rvalues as values:
employees %= fn::sort_by L( fn::tie_lvals( _.last_name.size(), _.last_name, _.first_name ));
// fn::by::decreasing() and fn::by::decreasing_ref() can wrap individual values or references,
// The wrapper captures the value or reference and exposes inverted operator<.
// E.g. to sort by (last_name's length, last_name descending, first_name):
employees %= fn::sort_by L( fn::tie_lvals( _.last_name.size(), fn::by::decreasing_ref( _.last_name ), _.first_name ));
// fn::by::decreasing() can also wrap the entire projection-function:
employees %= fn::sort_by( fn::by::decreasing L( fn::tie_lvals( _.last_name.size(), _.last_name, _.first_name )));
// If the projection function is expensive, and you want to invoke it once per element:
auto expensive_key_fn = [](const employee_t& e) { return ... };
employees = std::move(employees)
% fn::transform([&](employee_t e)
{
auto key = expensive_key_fn(e);
return std::make_pair( std::move(e), std::move(key) );
})
% fn::sort_by( fn::by::second{}) // or L( std::ref( _.second ))
% fn::transform( fn::get::first{}) // or L( std::move( _.first ))
% fn::to_vector();
// Alternatively, expensive_key_fn can be wrapped with a unary-function-memoizer
// (results are internally cached in std::map and subsequent lookups are log-time).
employees %= fn::sort_by( fn::make_memoized( expensive_key_fn ));
// fn::make_comp() takes a projection and creates a binary Comparator object
// that can be passed to algorithms that require one.
gfx::timsort( employees.begin(), employees.end(),
fn::by::make_comp L( std::tie( _.last_name, _.first_name )));
#include
and compilation-time overhead
Despite packing a lot of functionality, #include <fn.hpp>
adds only a tiny sliver (~0.03s) of compilation-time overhead in addition to the few common standard library include-s that it relies upon:
// tmp.cpp
#if defined(STD_INCLUDES)
# include <stdexcept>
# include <algorithm>
# include <functional>
# include <vector>
# include <map>
# include <deque>
# include <string>
# include <cassert>
#elif defined(INCLUDE_FN)
# include "fn.hpp"
#elif defined(INCLUDE_RANGE_ALL)
# include <range/v3/all.hpp>
#endif
int main()
{
return 0;
}
# just std-includes used by fn.hpp
>>time for i in {1..10}; do g++ -std=c++14 -DSTD_INCLUDES=1 -o tmp.o -c tmp.cpp; done
real 0m3.682s
user 0m3.106s
sys 0m0.499s
# fn.hpp
>>time for i in {1..10}; do g++ -std=c++14 -DINCLUDE_FN=1 -o tmp.o -c tmp.cpp; done
real 0m3.887s
user 0m3.268s
sys 0m0.546s
# range/v3/all.hpp, for comparison
>>time for i in {1..10}; do g++ -std=c++14 -DINCLUDE_RANGE_ALL=1 -I. -o tmp.o -c tmp.cpp; done
real 0m22.687s
user 0m20.412s
sys 0m2.043s
There are not many compiler-torturing metaprogramming techniques used by the library, so the template-instantiation overhead is reasonable as well (see the Top-5 most frequent word example; leaving it as an excercise to the reader to compare with raw-STL-based implementation).
Discussion
Many programmers after getting introduced to toy examples get an impression that
the intended usage is "to express the intent" or "better syntax" or to "separate the concerns", etc.
Others look at the toy examples and point out that they could be straightforwardly written
as normal imperative code, and I tend to agree with them:
There's no real reason to write code like:
std::move(xs)
% fn::where( [](const auto& x) { return x % 2 == 0; })
% fn::transform( [](const auto& x) { return x * x; }
% fn::for_each( [](const auto& x) { std::cout << x << "\n"; })
, if it can be written simply as
for(const auto& x : xs)
if(x % 2 == 0)
std::cerr << x*x << "\n";
The "functional-style" equivalent just incurs additional compile-time, debugging-layers, and possibly run-time overhead.
There are some scenarios where functional-style is useful:
Const-correctness and reduction of mutable state.
When you declare a non-const variable in your code (or worse, when you have to deal with an API that forces you to do that), you introduce another "moving part" in your program. Having more things const makes your code more robust and easier to reason about.
With this library you can express complex data transformations as a single expression, assigning the result to a const variable
(unless you intend to std::move()
it, of course).
const auto result = std::move(inputs) % stage1 % stage2 % ... ;
Another code-pattern for this is immediately-executed-lambda.
const X x = [&]
{
X x{};
// build X...
return x;
}();
Reduced boilerplate.
E.g. compare
employees %= fn::where L( _.last_name != "Doe" );
vs. idiomatic c++ way:
employees.erase(
std::remove_if( employees.begin(),
employees.end(),
[](const employee_t& e)
{
return e.last_name == "Doe";
}),
employees.end());
InputRange
s)
Transformations over infinite (arbitrarily large) streams (The most useful use-case is the scenarios for writing a functional pipeline over an infinite stream that you want to manipulate lazily, like a UNIX pipeline, e.g. the above-mentioned aln_filter.cpp.
Implement embarassingly-parallel problems trivially.
Replacing fn::transform
with fn::transform_in_parallel
where appropriate may be all it takes to parallelize your code.
Downsides and Caveats.
Compilation errors related to templates are completely gnarly.
For this reason the library has many static_asserts
to help you figure things out. If you encounter a compilation error that could benefit from adding a static_assert
, please open an issue.
Sometimes it may be difficult to reason about the complexity space and time requirements of some operations. There are two ways to approach this: 1) Peek into documentation where I discuss space and time big-O for cases that are not obvious (e.g. how lazy_sort_by
differs from regular sort_by
, or how unique_all_by
operates in single-pass for seq
s). 2) Feel free to peek under the hood of the library. Most of the code is intermediate-level c++ that should be within the ability to grasp by someone familiar with STL and <algorithm>
.
There's a possibility that a user may instantiate a seq
and then forget to actually iterate over it, e.g.
std::move(inputs) % fn::transform(...); // creates and immediately destroys a lazy-seq.
// whereas the user code probably intended:
std::move(inputs) % fn::transform(...) % fn::for_each(...);
Minimum supported compilers: MSVC-19.15, GCC-4.9.3, clang-3.7, ICC-18
References:
- Haskell Data.List
- Scala LazyList
- Elixir Stream
- Elm List
- O'Caml List
- F# Collections.Seq
- D std.range
- Rust Iterator
c++ -specific (in no particular order):
- c++20 std::ranges
- tcbrindle/NanoRange
- ericniebler/range-v3
- Dobiasd/FunctionalPlus
- jscheiny/Streams
- A Lazy Stream Implementation in C++11
- ryanhaining/CPPItertools
- soheilhy/fn
- LoopPerfect/conduit
- k06a/boolinq
- pfultz2/linq
- boost.range
- cpplinq
- simonask/rx-ranges
- ReactiveX/RxCpp
- arximboldi/zug
- MarcDirven/cpp-lazy
- qnope/Little-Type-Library
- ftlorg/ftl
Recommended blogs: