• Stars
    star
    168
  • Rank 225,507 (Top 5 %)
  • Language
    Go
  • License
    MIT License
  • Created about 7 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Package influxql implements a parser for the InfluxDB query language.

The Influx Query Language Specification

Introduction

This is a reference for the Influx Query Language ("InfluxQL").

InfluxQL is a SQL-like query language for interacting with InfluxDB. It has been lovingly crafted to feel familiar to those coming from other SQL or SQL-like environments while providing features specific to storing and analyzing time series data.

Notation

The syntax is specified using Extended Backus-Naur Form ("EBNF"). EBNF is the same notation used in the Go programming language specification, which can be found here. Not so coincidentally, InfluxDB is written in Go.

Production  = production_name "=" [ Expression ] "." .
Expression  = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term        = production_name | token [ "…" token ] | Group | Option | Repetition .
Group       = "(" Expression ")" .
Option      = "[" Expression "]" .
Repetition  = "{" Expression "}" .

Notation operators in order of increasing precedence:

|   alternation
()  grouping
[]  option (0 or 1 times)
{}  repetition (0 to n times)

Comments

Both single and multiline comments are supported. A comment is treated the same as whitespace by the parser.

-- single line comment
/*
    multiline comment
*/

Single line comments will skip all text until the scanner hits a newline. Multiline comments will skip all text until the end comment marker is hit. Nested multiline comments are not supported so the following does not work:

/* /* this does not work */ */

Query representation

Characters

InfluxQL is Unicode text encoded in UTF-8.

newline             = /* the Unicode code point U+000A */ .
unicode_char        = /* an arbitrary Unicode code point except newline */ .

Letters and digits

Letters are the set of ASCII characters plus the underscore character _ (U+005F) is considered a letter.

Only decimal digits are supported.

letter              = ascii_letter | "_" .
ascii_letter        = "A" … "Z" | "a" … "z" .
digit               = "0" … "9" .

Identifiers

Identifiers are tokens which refer to database names, retention policy names, user names, measurement names, tag keys, and field keys.

The rules:

  • double quoted identifiers can contain any unicode character other than a new line
  • double quoted identifiers can contain escaped " characters (i.e., \")
  • double quoted identifiers can contain InfluxQL keywords
  • unquoted identifiers must start with an upper or lowercase ASCII character or "_"
  • unquoted identifiers may contain only ASCII letters, decimal digits, and "_"
identifier          = unquoted_identifier | quoted_identifier .
unquoted_identifier = ( letter ) { letter | digit } .
quoted_identifier   = `"` unicode_char { unicode_char } `"` .

Examples:

cpu
_cpu_stats
"1h"
"anything really"
"1_Crazy-1337.identifier>NAME👍"

Keywords

ALL           ALTER         ANALYZE       ANY           AS            ASC
BEGIN         BY            CREATE        CONTINUOUS    DATABASE      DATABASES
DEFAULT       DELETE        DESC          DESTINATIONS  DIAGNOSTICS   DISTINCT
DROP          DURATION      END           EVERY         EXPLAIN       FIELD
FOR           FROM          GRANT         GRANTS        GROUP         GROUPS
IN            INF           INSERT        INTO          KEY           KEYS
KILL          LIMIT         SHOW          MEASUREMENT   MEASUREMENTS  NAME
OFFSET        ON            ORDER         PASSWORD      POLICY        POLICIES
PRIVILEGES    QUERIES       QUERY         READ          REPLICATION   RESAMPLE
RETENTION     REVOKE        SELECT        SERIES        SET           SHARD
SHARDS        SLIMIT        SOFFSET       STATS         SUBSCRIPTION  SUBSCRIPTIONS
TAG           TO            USER          USERS         VALUES        WHERE
WITH          WRITE

Literals

Integers

InfluxQL supports decimal integer literals. Hexadecimal and octal literals are not currently supported.

int_lit             = [ "+" | "-" ] ( "1" … "9" ) { digit } .

Floats

InfluxQL supports floating-point literals. Exponents are not currently supported.

float_lit           = [ "+" | "-" ] ( "." digit { digit } | digit { digit } "." { digit } ) .

Strings

String literals must be surrounded by single quotes. Strings may contain ' characters as long as they are escaped (i.e., \').

string_lit          = `'` { unicode_char } `'` .

Durations

Duration literals specify a length of time. An integer literal followed immediately (with no spaces) by a duration unit listed below is interpreted as a duration literal.

Duration units

Units Meaning
u or µ microseconds (1 millionth of a second)
ms milliseconds (1 thousandth of a second)
s second
m minute
h hour
d day
w week
duration_lit        = int_lit duration_unit .
duration_unit       = "u" | "µ" | "ms" | "s" | "m" | "h" | "d" | "w" .

Dates & Times

The date and time literal format is not specified in EBNF like the rest of this document. It is specified using Go's date / time parsing format, which is a reference date written in the format required by InfluxQL. The reference date time is:

InfluxQL reference date time: January 2nd, 2006 at 3:04:05 PM

time_lit            = "2006-01-02 15:04:05.999999" | "2006-01-02" .

Booleans

bool_lit            = TRUE | FALSE .

Regular Expressions

regex_lit           = "/" { unicode_char } "/" .

Comparators: =~ matches against !~ doesn't match against

Note: Use regular expressions to match measurements and tags. You cannot use regular expressions to match databases, retention policies, or fields.

Queries

A query is composed of one or more statements separated by a semicolon.

query               = statement { ";" statement } .

statement           = alter_retention_policy_stmt |
                      create_continuous_query_stmt |
                      create_database_stmt |
                      create_retention_policy_stmt |
                      create_subscription_stmt |
                      create_user_stmt |
                      delete_stmt |
                      drop_continuous_query_stmt |
                      drop_database_stmt |
                      drop_measurement_stmt |
                      drop_retention_policy_stmt |
                      drop_series_stmt |
                      drop_shard_stmt |
                      drop_subscription_stmt |
                      drop_user_stmt |
                      explain_stmt |
                      grant_stmt |
                      kill_query_statement |
                      show_continuous_queries_stmt |
                      show_databases_stmt |
                      show_field_keys_stmt |
                      show_grants_stmt |
                      show_measurements_stmt |
                      show_queries_stmt |
                      show_retention_policies |
                      show_series_stmt |
                      show_shard_groups_stmt |
                      show_shards_stmt |
                      show_subscriptions_stmt|
                      show_tag_keys_stmt |
                      show_tag_values_stmt |
                      show_users_stmt |
                      revoke_stmt |
                      select_stmt .

Statements

ALTER RETENTION POLICY

alter_retention_policy_stmt  = "ALTER RETENTION POLICY" policy_name on_clause
                               retention_policy_option
                               [ retention_policy_option ]
                               [ retention_policy_option ]
                               [ retention_policy_option ] .

Replication factors do not serve a purpose with single node instances.

Examples:

-- Set default retention policy for mydb to 1h.cpu.
ALTER RETENTION POLICY "1h.cpu" ON "mydb" DEFAULT

-- Change duration and replication factor.
ALTER RETENTION POLICY "policy1" ON "somedb" DURATION 1h REPLICATION 4

CREATE CONTINUOUS QUERY

create_continuous_query_stmt = "CREATE CONTINUOUS QUERY" query_name on_clause
                               [ "RESAMPLE" resample_opts ]
                               "BEGIN" select_stmt "END" .

query_name                   = identifier .

resample_opts                = (every_stmt for_stmt | every_stmt | for_stmt) .
every_stmt                   = "EVERY" duration_lit
for_stmt                     = "FOR" duration_lit

Examples:

-- selects from DEFAULT retention policy and writes into 6_months retention policy
CREATE CONTINUOUS QUERY "10m_event_count"
ON "db_name"
BEGIN
  SELECT count("value")
  INTO "6_months"."events"
  FROM "events"
  GROUP BY time(10m)
END;

-- this selects from the output of one continuous query in one retention policy and outputs to another series in another retention policy
CREATE CONTINUOUS QUERY "1h_event_count"
ON "db_name"
BEGIN
  SELECT sum("count") as "count"
  INTO "2_years"."events"
  FROM "6_months"."events"
  GROUP BY time(1h)
END;

-- this customizes the resample interval so the interval is queried every 10s and intervals are resampled until 2m after their start time
-- when resample is used, at least one of "EVERY" or "FOR" must be used
CREATE CONTINUOUS QUERY "cpu_mean"
ON "db_name"
RESAMPLE EVERY 10s FOR 2m
BEGIN
  SELECT mean("value")
  INTO "cpu_mean"
  FROM "cpu"
  GROUP BY time(1m)
END;

CREATE DATABASE

create_database_stmt = "CREATE DATABASE" db_name
                       [ WITH
                           [ retention_policy_duration ]
                           [ retention_policy_replication ]
                           [ retention_policy_shard_group_duration ]
                           [ retention_policy_name ]
                       ] .

Replication factors do not serve a purpose with single node instances.

Examples:

-- Create a database called foo
CREATE DATABASE "foo"

-- Create a database called bar with a new DEFAULT retention policy and specify the duration, replication, shard group duration, and name of that retention policy
CREATE DATABASE "bar" WITH DURATION 1d REPLICATION 1 SHARD DURATION 30m NAME "myrp"

-- Create a database called mydb with a new DEFAULT retention policy and specify the name of that retention policy
CREATE DATABASE "mydb" WITH NAME "myrp"

CREATE RETENTION POLICY

create_retention_policy_stmt = "CREATE RETENTION POLICY" policy_name on_clause
                               retention_policy_duration
                               retention_policy_replication
                               [ retention_policy_shard_group_duration ]
                               [ "DEFAULT" ] .

Replication factors do not serve a purpose with single node instances.

Examples

-- Create a retention policy.
CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2

-- Create a retention policy and set it as the DEFAULT.
CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2 DEFAULT

-- Create a retention policy and specify the shard group duration.
CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2 SHARD DURATION 30m

CREATE SUBSCRIPTION

Subscriptions tell InfluxDB to send all the data it receives to Kapacitor or other third parties.

create_subscription_stmt = "CREATE SUBSCRIPTION" subscription_name "ON" db_name "." retention_policy "DESTINATIONS" ("ANY"|"ALL") host { "," host} .

Examples:

-- Create a SUBSCRIPTION on database 'mydb' and retention policy 'autogen' that send data to 'example.com:9090' via UDP.
CREATE SUBSCRIPTION "sub0" ON "mydb"."autogen" DESTINATIONS ALL 'udp://example.com:9090'

-- Create a SUBSCRIPTION on database 'mydb' and retention policy 'autogen' that round robins the data to 'h1.example.com:9090' and 'h2.example.com:9090'.
CREATE SUBSCRIPTION "sub0" ON "mydb"."autogen" DESTINATIONS ANY 'udp://h1.example.com:9090', 'udp://h2.example.com:9090'

CREATE USER

create_user_stmt = "CREATE USER" user_name "WITH PASSWORD" password
                   [ "WITH ALL PRIVILEGES" ] .

Examples:

-- Create a normal database user.
CREATE USER "jdoe" WITH PASSWORD '1337password'

-- Create an admin user.
-- Note: Unlike the GRANT statement, the "PRIVILEGES" keyword is required here.
CREATE USER "jdoe" WITH PASSWORD '1337password' WITH ALL PRIVILEGES

Note: The password string must be wrapped in single quotes.

DELETE

delete_stmt = "DELETE" ( from_clause | where_clause | from_clause where_clause ) .

Examples:

DELETE FROM "cpu"
DELETE FROM "cpu" WHERE time < '2000-01-01T00:00:00Z'
DELETE WHERE time < '2000-01-01T00:00:00Z'

DROP CONTINUOUS QUERY

drop_continuous_query_stmt = "DROP CONTINUOUS QUERY" query_name on_clause .

Example:

DROP CONTINUOUS QUERY "myquery" ON "mydb"

DROP DATABASE

drop_database_stmt = "DROP DATABASE" db_name .

Example:

DROP DATABASE "mydb"

DROP MEASUREMENT

drop_measurement_stmt = "DROP MEASUREMENT" measurement .

Examples:

-- drop the cpu measurement
DROP MEASUREMENT "cpu"

DROP RETENTION POLICY

drop_retention_policy_stmt = "DROP RETENTION POLICY" policy_name on_clause .

Example:

-- drop the retention policy named 1h.cpu from mydb
DROP RETENTION POLICY "1h.cpu" ON "mydb"

DROP SERIES

drop_series_stmt = "DROP SERIES" ( from_clause | where_clause | from_clause where_clause ) .

Example:

DROP SERIES FROM "telegraf"."autogen"."cpu" WHERE cpu = 'cpu8'

DROP SHARD

drop_shard_stmt = "DROP SHARD" ( shard_id ) .

Example:

DROP SHARD 1

DROP SUBSCRIPTION

drop_subscription_stmt = "DROP SUBSCRIPTION" subscription_name "ON" db_name "." retention_policy .

Example:

DROP SUBSCRIPTION "sub0" ON "mydb"."autogen"

DROP USER

drop_user_stmt = "DROP USER" user_name .

Example:

DROP USER "jdoe"

EXPLAIN

NOTE: This functionality is unimplemented.

explain_stmt = "EXPLAIN" [ "ANALYZE" ] select_stmt .

GRANT

NOTE: Users can be granted privileges on databases that do not exist.

grant_stmt = "GRANT" privilege [ on_clause ] to_clause .

Examples:

-- grant admin privileges
GRANT ALL TO "jdoe"

-- grant read access to a database
GRANT READ ON "mydb" TO "jdoe"

KILL QUERY

kill_query_statement = "KILL QUERY" query_id .

Examples:

--- kill a query with the query_id 36
KILL QUERY 36

NOTE: Identify the query_id from the SHOW QUERIES output.

SHOW CONTINUOUS QUERIES

show_continuous_queries_stmt = "SHOW CONTINUOUS QUERIES" .

Example:

-- show all continuous queries
SHOW CONTINUOUS QUERIES

SHOW DATABASES

show_databases_stmt = "SHOW DATABASES" .

Example:

-- show all databases
SHOW DATABASES

SHOW FIELD KEYS

show_field_keys_stmt = "SHOW FIELD KEYS" [ from_clause ] .

Examples:

-- show field keys and field value data types from all measurements
SHOW FIELD KEYS

-- show field keys and field value data types from specified measurement
SHOW FIELD KEYS FROM "cpu"

SHOW GRANTS

show_grants_stmt = "SHOW GRANTS FOR" user_name .

Example:

-- show grants for jdoe
SHOW GRANTS FOR "jdoe"

SHOW MEASUREMENTS

show_measurements_stmt = "SHOW MEASUREMENTS" [on_clause] [ with_measurement_clause ] [ where_clause ] [ limit_clause ] [ offset_clause ] .

Examples:

-- show all measurements
SHOW MEASUREMENTS

-- show all measurements on all databases
SHOW MEASUREMENTS ON *.*

-- show all measurements on specific database and retention policy
SHOW MEASUREMENTS ON mydb.myrp

-- show measurements where region tag = 'uswest' AND host tag = 'serverA'
SHOW MEASUREMENTS WHERE "region" = 'uswest' AND "host" = 'serverA'

-- show measurements that start with 'h2o'
SHOW MEASUREMENTS WITH MEASUREMENT =~ /h2o.*/

SHOW QUERIES

show_queries_stmt = "SHOW QUERIES" .

Example:

-- show all currently-running queries
SHOW QUERIES

SHOW RETENTION POLICIES

show_retention_policies = "SHOW RETENTION POLICIES" on_clause .

Example:

-- show all retention policies on a database
SHOW RETENTION POLICIES ON "mydb"

SHOW SERIES

show_series_stmt = "SHOW SERIES" [ from_clause ] [ where_clause ] [ limit_clause ] [ offset_clause ] .

Example:

SHOW SERIES FROM "telegraf"."autogen"."cpu" WHERE cpu = 'cpu8'

SHOW SHARD GROUPS

show_shard_groups_stmt = "SHOW SHARD GROUPS" .

Example:

SHOW SHARD GROUPS

SHOW SHARDS

show_shards_stmt = "SHOW SHARDS" .

Example:

SHOW SHARDS

SHOW SUBSCRIPTIONS

show_subscriptions_stmt = "SHOW SUBSCRIPTIONS" .

Example:

SHOW SUBSCRIPTIONS

SHOW TAG KEYS

show_tag_keys_stmt = "SHOW TAG KEYS" [ from_clause ] [ where_clause ] [ group_by_clause ]
                     [ limit_clause ] [ offset_clause ] .

Examples:

-- show all tag keys
SHOW TAG KEYS

-- show all tag keys from the cpu measurement
SHOW TAG KEYS FROM "cpu"

-- show all tag keys from the cpu measurement where the region key = 'uswest'
SHOW TAG KEYS FROM "cpu" WHERE "region" = 'uswest'

-- show all tag keys where the host key = 'serverA'
SHOW TAG KEYS WHERE "host" = 'serverA'

SHOW TAG VALUES

show_tag_values_stmt = "SHOW TAG VALUES" [ from_clause ] with_tag_clause [ where_clause ]
                       [ group_by_clause ] [ limit_clause ] [ offset_clause ] .

Examples:

-- show all tag values across all measurements for the region tag
SHOW TAG VALUES WITH KEY = "region"

-- show tag values from the cpu measurement for the region tag
SHOW TAG VALUES FROM "cpu" WITH KEY = "region"

-- show tag values across all measurements for all tag keys that do not include the letter c
SHOW TAG VALUES WITH KEY !~ /.*c.*/

-- show tag values from the cpu measurement for region & host tag keys where service = 'redis'
SHOW TAG VALUES FROM "cpu" WITH KEY IN ("region", "host") WHERE "service" = 'redis'

SHOW USERS

show_users_stmt = "SHOW USERS" .

Example:

-- show all users
SHOW USERS

REVOKE

revoke_stmt = "REVOKE" privilege [ on_clause ] "FROM" user_name .

Examples:

-- revoke admin privileges from jdoe
REVOKE ALL PRIVILEGES FROM "jdoe"

-- revoke read privileges from jdoe on mydb
REVOKE READ ON "mydb" FROM "jdoe"

SELECT

select_stmt = "SELECT" fields from_clause [ into_clause ] [ where_clause ]
              [ group_by_clause ] [ order_by_clause ] [ limit_clause ]
              [ offset_clause ] [ slimit_clause ] [ soffset_clause ]
              [ timezone_clause ] .

Examples:

-- select mean value from the cpu measurement where region = 'uswest' grouped by 10 minute intervals
SELECT mean("value") FROM "cpu" WHERE "region" = 'uswest' GROUP BY time(10m) fill(0)

-- select from all measurements beginning with cpu into the same measurement name in the cpu_1h retention policy
SELECT mean("value") INTO "cpu_1h".:MEASUREMENT FROM /cpu.*/

-- select from measurements grouped by the day with a timezone
SELECT mean("value") FROM "cpu" GROUP BY region, time(1d) fill(0) tz("America/Chicago")

Clauses

from_clause     = "FROM" measurements .

group_by_clause = "GROUP BY" dimensions fill(fill_option).

into_clause     = "INTO" ( measurement | back_ref ).

limit_clause    = "LIMIT" int_lit .

offset_clause   = "OFFSET" int_lit .

slimit_clause   = "SLIMIT" int_lit .

soffset_clause  = "SOFFSET" int_lit .

timezone_clause = tz(string_lit) .

on_clause       = "ON" db_name .

order_by_clause = "ORDER BY" sort_fields .

to_clause       = "TO" user_name .

where_clause    = "WHERE" expr .

with_measurement_clause = "WITH MEASUREMENT" ( "=" measurement | "=~" regex_lit ) .

with_tag_clause = "WITH KEY" ( "=" tag_key | "!=" tag_key | "=~" regex_lit | "IN (" tag_keys ")"  ) .

Expressions

binary_op        = "+" | "-" | "*" | "/" | "%" | "&" | "|" | "^" | "AND" |
                   "OR" | "=" | "!=" | "<>" | "<" | "<=" | ">" | ">=" .

expr             = unary_expr { binary_op unary_expr } .

unary_expr       = "(" expr ")" | var_ref | time_lit | string_lit | int_lit |
                   float_lit | bool_lit | duration_lit | regex_lit .

Other

alias            = "AS" identifier .

back_ref         = ( policy_name ".:MEASUREMENT" ) |
                   ( db_name "." [ policy_name ] ".:MEASUREMENT" ) .

db_name          = identifier .

dimension        = expr .

dimensions       = dimension { "," dimension } .

field_key        = identifier .

field            = expr [ alias ] .

fields           = field { "," field } .

fill_option      = "null" | "none" | "previous" | "linear" | int_lit | float_lit .

host             = string_lit .

measurement      = measurement_name |
                   ( policy_name "." measurement_name ) |
                   ( db_name "." [ policy_name ] "." measurement_name ) .

measurements     = measurement { "," measurement } .

measurement_name = identifier | regex_lit .

password         = string_lit .

policy_name      = identifier .

privilege        = "ALL" [ "PRIVILEGES" ] | "READ" | "WRITE" .

query_id         = int_lit .

query_name       = identifier .

retention_policy = identifier .

retention_policy_option      = retention_policy_duration |
                               retention_policy_replication |
                               retention_policy_shard_group_duration |
                               "DEFAULT" .

retention_policy_duration    = "DURATION" duration_lit .

retention_policy_replication = "REPLICATION" int_lit .

retention_policy_shard_group_duration = "SHARD DURATION" duration_lit .

retention_policy_name = "NAME" identifier .

series_id        = int_lit .

shard_id         = int_lit .

sort_field       = field_key [ ASC | DESC ] .

sort_fields      = sort_field { "," sort_field } .

subscription_name = identifier .

tag_key          = identifier .

tag_keys         = tag_key { "," tag_key } .

user_name        = identifier .

var_ref          = measurement .

Query Engine Internals

Once you understand the language itself, it's important to know how these language constructs are implemented in the query engine. This gives you an intuitive sense for how results will be processed and how to create efficient queries.

The life cycle of a query looks like this:

  1. InfluxQL query string is tokenized and then parsed into an abstract syntax tree (AST). This is the code representation of the query itself.

  2. The AST is passed to the QueryExecutor which directs queries to the appropriate handlers. For example, queries related to meta data are executed by the meta service and SELECT statements are executed by the shards themselves.

  3. The query engine then determines the shards that match the SELECT statement's time range. From these shards, iterators are created for each field in the statement.

  4. Iterators are passed to the emitter which drains them and joins the resulting points. The emitter's job is to convert simple time/value points into the more complex result objects that are returned to the client.

Understanding Iterators

Iterators are at the heart of the query engine. They provide a simple interface for looping over a set of points. For example, this is an iterator over Float points:

type FloatIterator interface {
    Next() (*FloatPoint, error)
}

These iterators are created through the IteratorCreator interface:

type IteratorCreator interface {
    CreateIterator(m *Measurement, opt IteratorOptions) (Iterator, error)
}

The IteratorOptions provide arguments about field selection, time ranges, and dimensions that the iterator creator can use when planning an iterator. The IteratorCreator interface is used at many levels such as the Shards, Shard, and Engine. This allows optimizations to be performed when applicable such as returning a precomputed COUNT().

Iterators aren't just for reading raw data from storage though. Iterators can be composed so that they provided additional functionality around an input iterator. For example, a DistinctIterator can compute the distinct values for each time window for an input iterator. Or a FillIterator can generate additional points that are missing from an input iterator.

This composition also lends itself well to aggregation. For example, a statement such as this:

SELECT MEAN(value) FROM cpu GROUP BY time(10m)

In this case, MEAN(value) is a MeanIterator wrapping an iterator from the underlying shards. However, if we can add an additional iterator to determine the derivative of the mean:

SELECT DERIVATIVE(MEAN(value), 20m) FROM cpu GROUP BY time(10m)

Understanding Auxiliary Fields

Because InfluxQL allows users to use selector functions such as FIRST(), LAST(), MIN(), and MAX(), the engine must provide a way to return related data at the same time with the selected point.

For example, in this query:

SELECT FIRST(value), host FROM cpu GROUP BY time(1h)

We are selecting the first value that occurs every hour but we also want to retrieve the host associated with that point. Since the Point types only specify a single typed Value for efficiency, we push the host into the auxiliary fields of the point. These auxiliary fields are attached to the point until it is passed to the emitter where the fields get split off to their own iterator.

Built-in Iterators

There are many helper iterators that let us build queries:

  • Merge Iterator - This iterator combines one or more iterators into a single new iterator of the same type. This iterator guarantees that all points within a window will be output before starting the next window but does not provide ordering guarantees within the window. This allows for fast access for aggregate queries which do not need stronger sorting guarantees.

  • Sorted Merge Iterator - This iterator also combines one or more iterators into a new iterator of the same type. However, this iterator guarantees time ordering of every point. This makes it slower than the MergeIterator but this ordering guarantee is required for non-aggregate queries which return the raw data points.

  • Limit Iterator - This iterator limits the number of points per name/tag group. This is the implementation of the LIMIT & OFFSET syntax.

  • Fill Iterator - This iterator injects extra points if they are missing from the input iterator. It can provide null points, points with the previous value, or points with a specific value.

  • Buffered Iterator - This iterator provides the ability to "unread" a point back onto a buffer so it can be read again next time. This is used extensively to provide lookahead for windowing.

  • Reduce Iterator - This iterator calls a reduction function for each point in a window. When the window is complete then all points for that window are output. This is used for simple aggregate functions such as COUNT().

  • Reduce Slice Iterator - This iterator collects all points for a window first and then passes them all to a reduction function at once. The results are returned from the iterator. This is used for aggregate functions such as DERIVATIVE().

  • Transform Iterator - This iterator calls a transform function for each point from an input iterator. This is used for executing binary expressions.

  • Dedupe Iterator - This iterator only outputs unique points. It is resource intensive so it is only used for small queries such as meta query statements.

Call Iterators

Function calls in InfluxQL are implemented at two levels. Some calls can be wrapped at multiple layers to improve efficiency. For example, a COUNT() can be performed at the shard level and then multiple CountIterators can be wrapped with another CountIterator to compute the count of all shards. These iterators can be created using NewCallIterator().

Some iterators are more complex or need to be implemented at a higher level. For example, the DERIVATIVE() needs to retrieve all points for a window first before performing the calculation. This iterator is created by the engine itself and is never requested to be created by the lower levels.

Subqueries

Subqueries are built on top of iterators. Most of the work involved in supporting subqueries is in organizing how data is streamed to the iterators that will process the data.

The final ordering of the stream has to output all points from one series before moving to the next series and it also needs to ensure those points are printed in order. So there are two separate concepts we need to consider when creating an iterator: ordering and grouping.

When an inner query has a different grouping than the outermost query, we still need to group together related points into buckets, but we do not have to ensure that all points from one buckets are output before the points in another bucket. In fact, if we do that, we will be unable to perform the grouping for the outer query correctly. Instead, we group all points by the outermost query for an interval and then, within that interval, we group the points for the inner query. For example, here are series keys and times in seconds (fields are omitted since they don't matter in this example):

cpu,host=server01 0
cpu,host=server01 10
cpu,host=server01 20
cpu,host=server01 30
cpu,host=server02 0
cpu,host=server02 10
cpu,host=server02 20
cpu,host=server02 30

With the following query:

SELECT mean(max) FROM (SELECT max(value) FROM cpu GROUP BY host, time(20s)) GROUP BY time(20s)

The final grouping keeps all of the points together which means we need to group server01 with server02. That means we output the points from the underlying engine like this:

cpu,host=server01 0
cpu,host=server01 10
cpu,host=server02 0
cpu,host=server02 10
cpu,host=server01 20
cpu,host=server01 30
cpu,host=server02 20
cpu,host=server02 30

Within each one of those time buckets, we calculate the max() value for each unique host so the output stream gets transformed to look like this:

cpu,host=server01 0
cpu,host=server02 0
cpu,host=server01 20
cpu,host=server02 20

Then we can process the mean() on this stream of data instead and it will be output in the correct order. This is true of any order of grouping since grouping can only go from more specific to less specific.

When it comes to ordering, unordered data is faster to process, but we always need to produce ordered data. When processing a raw query with no aggregates, we need to ensure data coming from the engine is ordered so the output is ordered. When we have an aggregate, we know one point is being emitted for each interval and will always produce ordered output. So for aggregates, we can take unordered data as the input and get ordered output. Any ordered data as input will always result in ordered data so we just need to look at how an iterator processes unordered data.

raw query selector (without group by time) selector (with group by time) aggregator
ordered input ordered output ordered output ordered output ordered output
unordered input unordered output unordered output ordered output ordered output

Since we always need ordered output, we just need to work backwards and determine which pattern of input gives us ordered output. If both ordered and unordered input produce ordered output, we prefer unordered input since it is faster.

There are also certain aggregates that require ordered input like median() and percentile(). These functions will explicitly request ordered input. It is also important to realize that selectors that are grouped by time are the equivalent of an aggregator. It is only selectors without a group by time that are different.

More Repositories

1

influxdb

Scalable datastore for metrics, events, and real-time analytics
Rust
28,401
star
2

telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
Go
14,568
star
3

kapacitor

Open source framework for processing, monitoring, and alerting on time series data
Go
2,310
star
4

influxdb-python

Python client for InfluxDB
Python
1,689
star
5

chronograf

Open source monitoring and visualization UI for the TICK stack
TypeScript
1,480
star
6

influxdb-java

Java client for InfluxDB
Java
1,178
star
7

influxdb-relay

Service to replicate InfluxDB data for high availability
Python
830
star
8

flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
FLUX
767
star
9

influxdb-client-python

InfluxDB 2.0 python client
Python
709
star
10

influxdb-client-go

InfluxDB 2 Go Client
Go
599
star
11

go-syslog

Blazing fast syslog parser
Go
478
star
12

sandbox

A sandbox for the full TICK stack
Shell
475
star
13

influxdb-client-java

InfluxDB 2 JVM Based Clients
Java
433
star
14

influxdb-php

influxdb-php: A PHP Client for InfluxDB, a time series database
PHP
431
star
15

influxdb-client-csharp

InfluxDB 2.x C# Client
C#
357
star
16

community-templates

InfluxDB Community Templates: Quickly collect & analyze time series data from a range of sources: Kubernetes, MySQL, Postgres, AWS, Nginx, Jenkins, and more.
Python
350
star
17

influxdb-client-js

InfluxDB 2.0 JavaScript client
TypeScript
326
star
18

influxdata-docker

Official docker images for the influxdata stack
Shell
314
star
19

influxdb-comparisons

Code for comparison write ups of InfluxDB and other solutions
Go
306
star
20

rskafka

A minimal Rust client for Apache Kafka
Rust
292
star
21

docs.influxdata.com-ARCHIVE

ARCHIVE - 1.x docs for InfluxData
Less
252
star
22

helm-charts

Official Helm Chart Repository for InfluxData Applications
Mustache
226
star
23

influxdb-rails

Ruby on Rails bindings to automatically write metrics into InfluxDB
Ruby
212
star
24

influxdb-csharp

A .NET library for efficiently sending points to InfluxDB 1.x
C#
198
star
25

influxdb1-client

The old clientv2 for InfluxDB 1.x
Go
190
star
26

giraffe

A foundation for visualizations in the InfluxDB UI
TypeScript
183
star
27

influxdb-client-php

InfluxDB (v2+) Client Library for PHP
PHP
149
star
28

tdigest

An implementation of Ted Dunning's t-digest in Go.
Go
133
star
29

influx-stress

New tool for generating artificial load on InfluxDB
Go
118
star
30

ui

UI for InfluxDB
TypeScript
93
star
31

tick-charts

A repository for Helm Charts for the full TICK Stack
Smarty
90
star
32

pbjson

Auto-generate serde implementations for prost types
Rust
89
star
33

telegraf-operator

telegraf-operator helps monitor application on Kubernetes with Telegraf
Go
80
star
34

inch

An InfluxDB benchmarking tool.
Go
78
star
35

influxdata-operator

A k8s operator for InfluxDB
Go
76
star
36

docs-v2

InfluxData Documentation that covers InfluxDB Cloud, InfluxDB OSS 2.x, InfluxDB OSS 1.x, InfluxDB Enterprise, Telegraf, Chronograf, Kapacitor, and Flux.
SCSS
72
star
37

wirey

Manage local wireguard interfaces in a distributed system
Go
66
star
38

influx-cli

CLI for managing resources in InfluxDB v2
Go
63
star
39

influxdb-go

61
star
40

terraform-aws-influx

Reusable infrastructure modules for running TICK stack on AWS
HCL
51
star
41

influxdb2-sample-data

Sample data for InfluxDB 2.0
JavaScript
46
star
42

influxdb-observability

Go
46
star
43

influxdb-client-ruby

InfluxDB 2.0 Ruby Client
Ruby
45
star
44

clockface

UI Kit for building Chronograf
TypeScript
44
star
45

grade

Track Go benchmark performance over time by storing results in InfluxDB
Go
43
star
46

influxdb-r

R library for InfluxDB
R
43
star
47

nginx-influxdb-module

C
39
star
48

nifi-influxdb-bundle

InfluxDB Processors For Apache NiFi
Java
36
star
49

line-protocol

Go
36
star
50

tensorflow-influxdb

Jupyter Notebook
34
star
51

iot-center-flutter

InlfuxDB 2.0 dart client flutter demo
Dart
34
star
52

whisper-migrator

A tool for migrating data from Graphite Whisper files to InfluxDB TSM files (version 0.10.0).
Go
33
star
53

flightsql-dbapi

DB API 2 interface for Flight SQL with SQLAlchemy extras.
Python
32
star
54

kube-influxdb

Configuration to monitor Kubernetes with the TICK stack
Shell
31
star
55

k8s-kapacitor-autoscale

Demonstration of using Kapacitor to autoscale a k8s deployment
Go
30
star
56

terraform-aws-influxdb

Deploys InfluxDB Enterprise to AWS
HCL
29
star
57

catslack

Shell -> Slack the easy way
Go
28
star
58

flux-lsp

Implementation of Language Server Protocol for the flux language
Rust
27
star
59

influxdb-operator

The Kubernetes operator for InfluxDB and the TICK stack.
Go
27
star
60

influxdb3_core

InfluxData's core functionality for InfluxDB Edge and IOx
Rust
26
star
61

influxdb-client-swift

InfluxDB (v2+) Client Library for Swift
Swift
26
star
62

influxdb-client-dart

InfluxDB (v2+) Client Library for Dart and Flutter
Dart
25
star
63

kapacitor-course

25
star
64

influxdb-c

C
25
star
65

vsflux

Flux language extension for VSCode
TypeScript
25
star
66

grafana-flightsql-datasource

Grafana plugin for Flight SQL APIs.
TypeScript
25
star
67

ansible-chrony

A role to manage chrony on Linux systems
Ruby
24
star
68

influxdb-scala

Scala client for InfluxDB
Scala
22
star
69

cron

A fast, zero-allocation cron parser in ragel and golang
Go
21
star
70

influxdb-plugin-fluent

A buffered output plugin for Fluentd and InfluxDB 2
Ruby
21
star
71

terraform-google-influx

Reusable infrastructure modules for running TICK stack on GCP
Shell
20
star
72

iot-api-python

Python
18
star
73

openapi

An OpenAPI specification for influx (cloud/oss) apis.
Shell
17
star
74

influxdb-university

InfluxDB University
Python
16
star
75

influxdb-client-r

InfluxDB (v2+) Client R Package
R
14
star
76

kafka-connect-influxdb

InfluxDB 2 Connector for Kafka
Scala
13
star
77

cd-gitops-reference-architecture

Details of the CD/GitOps architecture in use at InfluxData
Shell
13
star
78

iot-api-ui

Common React UI for iot-api-<js, python, etc.> example apps designed for InfluxDB client library tutorials.
TypeScript
13
star
79

oats

An OpenAPI to TypeScript generator.
TypeScript
12
star
80

awesome

SCSS
12
star
81

windows-packager

Create a windows installer
Shell
12
star
82

influxdb-gds-connector

Google Data Studio Connector for InfluxDB.
JavaScript
11
star
83

promql

Go
11
star
84

object_store_rs

Rust
10
star
85

yarpc

Yet Another RPC for Go
Go
10
star
86

ansible-influxdb-enterprise

Ansible role for deploying InfluxDB Enterprise.
10
star
87

influxdb-sample-data

Sample time series data used to test InfluxDB
9
star
88

ingen

ingen is a tool for directly generating TSM data
Go
9
star
89

parquet-bloom-filter-analysis

Generate Parquet Files
Rust
8
star
90

ansible-kapacitor

Official Kapacitor Ansible Role for Linux
Jinja
7
star
91

wlog

Simple log level based Go logger.
Go
7
star
92

iot-api-js

An example IoT app built with NextJS (NodeJS + React) and the InfluxDB API client library for Javascript.
JavaScript
7
star
93

influxdb-iox-client-go

InfluxDB/IOx Client for Go
Go
7
star
94

influxdb-templates

This repo is a collection of dashboard templates used in the InfluxDB UI.
JavaScript
7
star
95

k8s-jsonnet-libs

Jsonnet Libs repo - mostly generated with jsonnet-libs/k8s project
Jsonnet
7
star
96

google-deployment-manager-influxdb-enterprise

GCP Deployment Manager templates for InfluxDB Enterprise.
HTML
6
star
97

jaeger-influxdb

Go
6
star
98

influxdb-action

A GitHub action for setting up and configuring InfluxDB and the InfluxDB Cloud CLI
Shell
6
star
99

influxdb-fsharp

A F# client library for InfluxDB, a time series database http://influxdb.com
F#
6
star
100

qprof

A tool for profiling the performance of InfluxQL queries
Go
6
star