• Stars
    star
    10,069
  • Rank 3,465 (Top 0.07 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 13 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

q - Run SQL directly on delimited files and multi-file sqlite databases

Build and Package

q - Text as Data

q's purpose is to bring SQL expressive power to the Linux command line and to provide easy access to text as actual data.

q allows the following:

  • Performing SQL-like statements directly on tabular text data, auto-caching the data in order to accelerate additional querying on the same file.
  • Performing SQL statements directly on multi-file sqlite3 databases, without having to merge them or load them into memory

The following table shows the impact of using caching:

Rows Columns File Size Query time without caching Query time with caching Speed Improvement
5,000,000 100 4.8GB 4 minutes, 47 seconds 1.92 seconds x149
1,000,000 100 983MB 50.9 seconds 0.461 seconds x110
1,000,000 50 477MB 27.1 seconds 0.272 seconds x99
100,000 100 99MB 5.2 seconds 0.141 seconds x36
100,000 50 48MB 2.7 seconds 0.105 seconds x25

Notice that for the current version, caching is not enabled by default, since the caches take disk space. Use -C readwrite or -C read to enable it for a query, or add caching_mode to .qrc to set a new default.

q's web site is https://harelba.github.io/q/ or https://q.textasdata.wiki It contains everything you need to download and use q immediately.

Usage Examples

q treats ordinary files as database tables, and supports all SQL constructs, such as WHERE, GROUP BY, JOINs, etc. It supports automatic column name and type detection, and provides full support for multiple character encodings.

Here are some example commands to get the idea:

$ q "SELECT COUNT(*) FROM ./clicks_file.csv WHERE c3 > 32.3"

$ ps -ef | q -H "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"

$ q "select count(*) from some_db.sqlite3:::albums a left join another_db.sqlite3:::tracks t on (a.album_id = t.album_id)"

Detailed examples are in here

Installation.

New Major Version 3.1.6 is out with a lot of significant additions.

Instructions for all OSs are here.

The previous version 2.0.19 Can still be downloaded from here

Contact

Any feedback/suggestions/complaints regarding this tool would be much appreciated. Contributions are most welcome as well, of course.

Linkedin: Harel Ben Attia

Twitter @harelba

Email [email protected]

q on twitter: #qtextasdata

Patreon: harelba - All the money received is donated to the Center for the Prevention and Treatment of Domestic Violence in my hometown - Ramla, Israel.