• Stars
    star
    103
  • Rank 331,120 (Top 7 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 14 years ago
  • Updated over 10 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SQL Query Tool for Static Files

Description

Squawk is a library and command line tool for running SQL queries against structured/semi-structured static files. (e.g. Apache logs, csv files, tcpdump output).

License

BSD

See LICENSE

Goal

The purpose is Squawk is to make querying for data in log files or other structured files easier. Everything that Squawk does can be done by combining various unix tools, but Squawk makes it ever easier to express more complex relationships. It is in no way a database or meant to be used as such. It's merely a reporting tool.

Squawk can be used from the command line for ad-hoc queries, and it can also be used as a library as a part of a more in-depth reporting tool.

Status

Still in major development. API is guaranteed to change.

Requirements

Supported SQL Features

  • Aggregates: count, min, max, avg, sum
  • GROUP BY
  • ORDER BY (single column)
  • LIMIT
  • OFFSET
  • WHERE
  • Column aliases
  • Subqueries in FROM

Departures from Standard SQL

  • Table list in FROM uses a space rather than a comma as a separator. This makes it easier on the command line to specify files. (e.g. FROM access.log* )

Parsers

  • Common access file formats (nginx, apache)
  • CSV

Output Formats

  • Basic tabular for console (like most database command line tools)
  • JSON
  • CSV

Examples

SQL query on the command line::

$ squawk "SELECT COUNT(1) AS n, status FROM access.log GROUP BY status ORDER BY n DESC"
n	| status
----------------------------------------
381353	| 200
180668	| 302
17976	| 404
12952	| 301
10836	| 304
735	| 403
420	| 206
376	| 416
123	| 400
46	| 500
5	| 502
3	| 408
3	| 405
1	| 504

SQL based query through API::

query = Query(
    "SELECT COUNT(1) AS n, remote_addr"
    " FROM file"
    " WHERE status = 200"
    "  AND remote_addr != '-'"
    " GROUP BY remote_addr"
    " ORDER BY n DESC"
    " LIMIT 10")
source = AccessLogParser("access.log")
output_console(query(source))

# or

query = Query(
    "SELECT COUNT(1) AS n, remote_addr"
    " FROM file"
    " WHERE status = 200"
    "  AND remote_addr != '-'"
    " GROUP BY remote_addr"
    " ORDER BY n DESC"
    " LIMIT 10")
source = AccessLogParser("access.log")
for row in query(source):
    print row

Code generated query::

source = AccessLogParser("access.log")
filtered = Filter(source, lambda row:row['status'] == 200)
group_by = GroupBy(filtered, group_by=["remote_addr"], columns=[
    lambda:Column('remote_addr'),
    lambda:CountAggregate(None, 'count(1)')])
order_by = OrderBy(group_by, 'count(1)', True)
limit = LimitOffset(order_by, 10)
for row in limit:
    print row

More Repositories

1

go-zookeeper

Native ZooKeeper client for Go. This project is no longer maintained. Please use https://github.com/go-zookeeper/zk instead.
Go
1,640
star
2

go-thrift

A native Thrift package for Go
Go
385
star
3

python-munin

Python framework for building Munin plugins (also includes some plugins prebuilt).
Python
292
star
4

python-ping

Pure Python version of ICMP ping
Python
221
star
5

kokki

System configuration management framework influenced by Chef
Python
192
star
6

go-opencl

OpenCL bindings for Go
Go
143
star
7

python-bert

BERT serialization library for Python
Python
81
star
8

python-gearman

(maintenance transfered to http://github.com/Yelp/python-gearman) Gearman library for Python.
Python
70
star
9

python-erlastic

Erlang binary term codec and port interface. Modeled after Erlectricity.
Python
49
star
10

go-dsp

DSP and Software Defined Radio (SDR) package for Go
Go
46
star
11

go-ldap

LDAP client and server for Go
Go
42
star
12

go-metrics

Metrics library and aggregation daemon for Go
Go
32
star
13

go-socks

SOCKS5 proxy library for Go
Go
30
star
14

lua-quadtree

Quadtree library for Lua
Lua
26
star
15

go-hackrf

HackRF SDR interface library for Go
Go
22
star
16

go-gettext

Native gettext package for Go
Go
21
star
17

python-scrubber

A whitelisting HTML sanitizer for Python
Python
21
star
18

go-spacecurves

Space filling curves (Hilbert, Morton / Z-order)
Go
16
star
19

go-rtlsdr

RTL-SDR Package for Go
Go
13
star
20

go-pcx

PCX image encoder and decoder for Go
Go
11
star
21

erlang-gearman

Gearman library for Erlang.
Erlang
10
star
22

go-classifier

Document classifier (bayesian) package for Go
Go
9
star
23

go-remez

Parks-McClellan (aka Remez) algorithm
Go
7
star
24

go-cache

In process cache (LRU, ...) package for Go
Go
7
star
25

go-parser

Parsing package for Go
Go
5
star
26

go-emu

Emulators written in Go
Go
5
star
27

go-librato

Go library for Librato Metrics
Go
5
star
28

go-imagex

Image extensions for Go
Go
5
star
29

go-quicklook

Framework for building macOS QuickLook plugins in Go
C
3
star
30

go-redis

Redis client for Go
Go
3
star
31

go-dtw

Dynamic Time Warping (DTW) package for Go
Go
3
star
32

go-astar

A* implementation in Go
Go
3
star
33

TI86-Forth

Forth interpreter for the TI-86 calculator I made back in 1996.
2
star
34

gentlemanjunkie.com

Gentleman Junkie the band
2
star
35

rfexplorer

Go library and tools to communicate with the RF Explorer handheld spectrum analyzer
Go
2
star
36

go-bbcode

BBCode renderer for Go
Go
2
star
37

go-pinboard

Pinboard API client for Go
Go
2
star
38

samuel.github.com

My Github homepage
1
star
39

go-accelerate

Go bindings for Apple's accelerate framework
Go
1
star
40

graphics-tutorial

JavaScript
1
star
41

BrinnoTLC100

OS X app to the configure the Brinno TLC100 time-lapse camera
Objective-C
1
star