• Stars
    star
    131
  • Rank 275,867 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 10 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python in-memory test stub for BigQuery

tinyquery

tinyquery is a partial implementation of the BigQuery API that runs completely in-memory in Python code. If you have Python code that uses the BigQuery API, you may be able to use tinyquery to make your automated tests orders of magnitude faster, more reliable, and less annoying to run.

There are still lots of missing pieces, but enough of the API and SQL language is filled in that it can be used for some production-size BigQuery pipelines.

Motivation

BigQuery is a Google service that lets you import giant data sets and run arbitrary SQL queries over them. The most common use case is to allow people to dig into their data manually using SQL, but BigQuery also lets you build complex data pipelines entirely in SQL, which has a number of advantages over other approaches like MapReduce.

One of the biggest challenges when writing a data pipeline on top of BigQuery is writing high-quality automated tests. Normally, you're left with a few less-than-perfect options:

  • Skip automated testing and rely only on manual testing.
  • Swap out the BigQuery service with a mock object that asserts the API usage and returns pre-canned results. This lets you test some basic things, but won't give you confidence in your system if most of your business logic is in SQL.
  • Run tests against some other SQL implementation that can be run locally. Since every dialect of SQL is different (and with BigQuery having a dramatically different architecture than a typical relational database), this would likely cause more trouble than it's worth.
  • Run tests against the real BigQuery service. This is probably the best option, but has a few problems:
    • The tests can get very slow as the code gets more complex, since they will involve lots of network I/O over the internet and BigQuery operations tend to have highly-variable response times anyway.
    • Anyone running your test needs to have the right credentials to access BigQuery.
    • Running the test requires an internet connection.
    • If you don't take additional measures, two people running the test at the same time could trample each other's test state and cause confusing test failures.

tinyquery lets you write a test against the real BigQuery API, then swap out BigQuery for a fake implementation for fast iteration. For example, tinyquery was used to dramatically improve a test at Khan Academy that ran a large data pipeline three times in different conditions. Originally, the test took about 8 minutes (and at its worst point took about an hour to run) and required some manual steps. After modifying the test to use tinyquery, the test now takes 2 seconds to run and can be run as part of the regular build process (or manually by any developer) without any extra work.

How to use

tinyquery is a drop-in replacement as long as you're accessing the BigQuery API using the Python client library.

Here's the basic setup code that you would use to get started:

from tinyquery import tinyquery
from tinyquery import api_client
tq_service = tinyquery.TinyQuery()
tq_api_client = api_client.TinyQueryApiClient(tq_service)

The tq_service instance gives you direct Python access to various tinyquery operations. The tq_api_client instance is an API wrapper that you can patch/inject into your production code as a replacement for the return value of apiclient.discovery.build().

If your code catches the apiclient.errors.HttpError exception, you may also want to patch tinyquery.api_client.FakeHttpError with that class.

Features

  • Almost all of the core SQL language: SELECT, FROM, WHERE, HAVING, GROUP BY, JOIN (including LEFT OUTER JOIN and CROSS JOIN), LIMIT, subqueries.
  • Many of the common functions and operators. See runtime.py for a list.
  • Importing from CSV.
  • API wrappers for creating, getting, and deleting tables, and for creating and managing query and copy jobs and getting query results.

What's missing?

  • Many features that behave uniquely with repeated and record fields:
    • FLATTEN
    • WITHIN and scoped aggregation
    • POSITION and NEST
  • Window functions and OVER/PARTITION BY.
  • Various operators and functions.
  • Lots of API operations are unsupported. The ones that are supported are missing various return fields from the API.
  • There are some edge cases in the core language where BigQuery and tinyquery differ:
    • differences involving fully-qualified column names (e.g. table_alias.column being allowed in tinyquery but not BigQuery or vice versa).
    • tinyquery allows SELECTing from multiple repeated fields more often than bigquery does.

Contributing

See CONTRIBUTING.md

License

tinyquery is licensed under the MIT License.

Changelog

1.1

Pinned dependencies to versions.

1.0

Initial release to PyPI.

More Repositories

1

aphrodite

Framework-agnostic CSS-in-JS with support for server-side rendering, browser prefixing, and minimum CSS generation
JavaScript
5,348
star
2

style-guides

Docs for the Organization
Shell
2,136
star
3

khan-exercises

A (deprecated) framework for building exercises to work with Khan Academy.
HTML
1,594
star
4

perseus

Perseus is Khan Academy's exercise question editor and renderer.
TypeScript
1,368
star
5

genqlient

a truly type-safe Go GraphQL client
Go
1,042
star
6

react-components

JavaScript
1,008
star
7

live-editor

A browser-based live coding environment.
JavaScript
754
star
8

flow-to-ts

Convert flow code to typescript
TypeScript
385
star
9

khan-api

Documentation for (and examples of) using the Khan Academy API
375
star
10

gae_mini_profiler

A ubiquitous mini-profiler for Google App Engine, inspired by mvc-mini-profiler
Python
273
star
11

khan-mobile

Youโ€™re probably looking for www.github.com/khan/mobile
JavaScript
239
star
12

Prototope

Swift library of lightweight interfaces for prototyping, bridged to JS
Swift
230
star
13

math-input

math-input = react + redux + mathquill
JavaScript
219
star
14

snippets

Code related to collecting and pushing weekly snippets
Python
205
star
15

structuredjs

Test JavaScript code, look for functionality.
JavaScript
197
star
16

pull-request-comment-trigger

A github action for detecting a "trigger" in a pull request description or comment
JavaScript
190
star
17

react-multi-select

A multiple select component for React
JavaScript
178
star
18

guacamole

General Use Machine Learning for Learning Library
Python
144
star
19

react-render-server

A node.js server for server-side rendering anything!
JavaScript
140
star
20

slicker

a tool for moving things in python
Python
126
star
21

wonder-blocks

React components for Wonder Blocks design system.
TypeScript
123
star
22

KAS

A lightweight JavaScript CAS for comparing expressions and equations.
JavaScript
107
star
23

math-facts

JavaScript
77
star
24

hivemind

Experimental knowledge-management system for Long-term Research references
JavaScript
48
star
25

analytics

Tools to analyze KA logs and other data
Python
45
star
26

kmath

JavaScript Numeric Math Utilities
JavaScript
41
star
27

alertlib

A small library to make it easy to send alerts to various platforms
Python
41
star
28

khan-linter

Lint and code-munging tools for Khan Academy codebase
Python
40
star
29

jenkins-jobs

Scripts and the like that Jenkins jobs can run.
Groovy
35
star
30

react-balance-text

A React wrapper for the Adobe Web Platform's Balance-Text Project
JavaScript
34
star
31

frankenserver

A fork of the Google App Engine SDK with modifications required by Khan Academy
Python
30
star
32

khan-windows

Khan Academy for Windows 8
TypeScript
30
star
33

engblog

KA Engineering blog.
Python
29
star
34

mu-lambda

A small library of functional programming utilities.
JavaScript
25
star
35

gittip-gdoc

Extract records from a Google Doc spreadsheet and bulk set the results on Gittip.
JavaScript
23
star
36

internal-webserver

Code that runs on the khan-academy webserver ec2 instance (for dev tools and the like)
Python
21
star
37

youtube-export

Scripts to download and transcode Khan Academy videos and put them on S3
Python
20
star
38

KhanQuest

Khan Academy the game
JavaScript
18
star
39

culture-cow

NO LONGER USED! This is Culture Cow for HipChat. See Culture Cow code for Slack here: https://github.com/Khan/culture-cron
JavaScript
18
star
40

fuzzy-match-utils

A collection of string matching algorithms designed with React Select in mind.
JavaScript
17
star
41

graphie-to-png

A tool for converting graphie JS code to an image
HTML
17
star
42

react-native-codegen

Generating bindings between js & native via flow types
16
star
43

webapp-i18n-bigfile

The next generation of webapp-i18n, starting in April 2015, that uses git-bigfile to avoid storing large resources.
15
star
44

khan.github.io

An index of some of the open source efforts at Khan Academy
CSS
15
star
45

zendesk-theme

Files for our custom https://khanacademy.zendesk.com/ theme
CSS
15
star
46

typed-context

Sample code for the Khan Academy blog about statically typed context
Go
13
star
47

git-workflow

scripts to enable the git workflow at Khan Academy
Shell
12
star
48

localeplanet

A clone of the l10n files at localeplanet.com.
Python
11
star
49

OhaiPrototope

A prototope bootstrap project in xcode
Swift
11
star
50

dendro

A tool for analyzing dependency trees.
JavaScript
11
star
51

structured-blocks

JavaScript
11
star
52

Early-Math-Prototype-Player

Play the Early Math Prototypes embedded in an iPad app.
Swift
11
star
53

Cantor

Prototypes around a medium and toolset for exploring quantities and arithmetic operations
JavaScript
10
star
54

udp-relay

Fork of Simple UDP proxy at http://aluigi.altervista.org/mytoolz.htm, with modifications required by Khan Academy
C
10
star
55

computing-curriculum

Articles that teach computer programming and computer science on Khan Academy.
HTML
10
star
56

gae-continuous-deploy

A server which polls a repository and automatically deploys to Google App engine
JavaScript
10
star
57

slack-deploy-hooks

Slack outgoing webhooks to power Khan's deployment process
JavaScript
10
star
58

OpenResponses

Prototyping around supporting open-ended responses through peer learning
JavaScript
9
star
59

Early-Math-Prototypes

Exploratory prototypes of interactions for early math
JavaScript
9
star
60

BabyHint

Provide more helpful hints for JavaScript developers.
JavaScript
9
star
61

projects

We aim to provide motivating hands-on learning experiences for students all ages.
Arduino
8
star
62

appengine-mapreduce

A fork of http://code.google.com/p/appengine-mapreduce/ with modifications required by Khan Academy
Python
8
star
63

khan-webhooks

A KA-specific web hook to notify Slack/HipChat about Phabricator and GitHub events.
Python
7
star
64

kotlin-datastore

high-level kotlin library for accessing Google cloud datastore
Kotlin
7
star
65

react-sandbox

Play with your components!
JavaScript
7
star
66

khannotations

A React library for rough, animated, annotations.
TypeScript
7
star
67

KhanAcademy_clr

Khan Academy's colors, accessible via OS X's built-in color picker
7
star
68

BirdAcademy

This experimental learning activity tries to help young children implicitly construct a sense of place value via play.
JavaScript
7
star
69

i18n-babel-plugin

Babel plugin to convert <$_> and <$i18nDoNotTranslate> tags to function calls.
JavaScript
7
star
70

youtube-tools

Miscellaneous tools to modify KA's YouTube videos en masse
Python
6
star
71

translation-assistant

Intelligent translation memory for perseus exercises
JavaScript
6
star
72

wonder-stuff

Packages for sharing features across JavaScript-based projects
TypeScript
6
star
73

beep-boop

Automated issue-frequency HipChat notifier
Python
6
star
74

JSContextBenchmarking

How fast is iOS 7's new web-less JavaScript API?
Swift
6
star
75

todo-tools

Helps you track the TODOs in your codebase
Python
6
star
76

openpyxl

A fork of openpyxl with modifications required by Khan Academy.
Python
6
star
77

ka-clone

manages an isolated local gitconfig for cloned repositories
Python
6
star
78

ActiveQuizTouch

Exploring and expanding on the mechanics and concepts behind Jenova Chen's Active Quiz through multitouch prototyping
JavaScript
6
star
79

khan-i18n

This repo is just used to handle the issues for Khan Academy's internationalization efforts.
5
star
80

declarative-z-indexes

Prevent z-index conflicts by generating them from declarative constraints
JavaScript
5
star
81

two-truths

Two Truths and a Lie Slack Bot
Python
5
star
82

pull-request-workflow-cancel

Conserve resources by cancelling workflow runs for previous commits on a pull-request.
JavaScript
5
star
83

kake

A `make` library in Python
Python
5
star
84

placecomplete

A Select2/jQuery plugin for location autocomplete powered by the Google Maps API
JavaScript
5
star
85

algebra-tool

tool for manipulating algebraic expressions and equations
JavaScript
5
star
86

ka-player

Play your favorite Khan Academy CS programs on your phone
JavaScript
5
star
87

react-build

Configuration for making a custom build of React + ReactART for KA
JavaScript
4
star
88

render-gateway

The core implementation of our render-gateway service
JavaScript
4
star
89

real-time-exercises-dashboard

Dashboard of Khan Academy exercises completed in real time on a map!
JavaScript
4
star
90

khan-mobile-exercises

Edit our CSS to make exercises work great on mobile devices.
CSS
4
star
91

free-response-report

Publication regarding our experiments with open-ended online learning
JavaScript
4
star
92

A11yAnalytics

A tool to help you understand your users' accessibility needs.
Swift
4
star
93

mobile_video_zoom

Utility for producing mobile-friendly video zoom/pan sequences from KA videos
Python
4
star
94

eslint-plugin-khan

eslint plugin with our set of custom rules for various things
JavaScript
4
star
95

pygments-server

A simple server that provides HTTP access to `pygmentize`
Python
4
star
96

tutoring-accuracy-dataset

This repository hosts the paper โ€œLLM Based Math Tutoring: Challenges and Datasetโ€, along with the accompanying dataset. It explores the performance and challenges of Large Language Models (LLMs) in math tutoring scenarios, providing a benchmark dataset for evaluating LLM accuracy in educational contexts.
4
star
97

web-workshop

Go
3
star
98

long-term-research-reports

Repository for developing longer-form reports from the Long-term Research team
HTML
3
star
99

canals

Canals: pure URL routing
JavaScript
3
star
100

fastlike

Go
3
star