• Stars
    star
    779
  • Rank 57,960 (Top 2 %)
  • Language
    Go
  • License
    MIT License
  • Created over 2 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

BigQuery emulator server implemented in Go

BigQuery Emulator

Go GoDoc

BigQuery emulator server implemented in Go.
BigQuery emulator provides a way to launch a BigQuery server on your local machine for testing and development.

Features

  • If you can choose the Go language as BigQuery client, you can launch a BigQuery emulator on the same process as the testing process by httptest .
  • BigQuery emulator can be built as a static single binary and can be launched as a standalone process. So, you can use the BigQuery emulator from programs written in non-Go languages or such as the bq command, by specifying the address of the launched BigQuery emulator.
  • BigQuery emulator utilizes SQLite for storage. You can select either memory or file as the data storage destination at startup, and if you set it to file, data can be persisted.
  • You can load seeds from a YAML file on startup

Status

Although this project is still in beta version, many features are already available.

BigQuery API

We've been implemented the all BigQuery APIs except the API to manipulate IAM resources. It is possible that some options are not supported, in which case please report them in an Issue.

Google Cloud Storage linkage

BigQuery emulator supports loading data from Google Cloud Storage and extracting table data. Currently, only CSV and JSON data types can be used for extracting. If you use Google Cloud Storage emulator, please set STORAGE_EMULATOR_HOST environment variable.

BigQuery Storage API

Supports gRPC-based read/write using BigQuery Storage API. Supports both Apache Avro and Arrow formats.

Google Standard SQL

BigQuery emulator supports many of the specifications present in Google Standard SQL. For example, it has the following features.

  • 200+ standard functions
  • Wildcard table
  • Templated Argument Function
  • JavaScript UDF

If you want to know the specific features supported, please see here

Goals and Sponsors

The goal of this project is to build a server that behaves exactly like BigQuery from the BigQuery client's perspective. To do so, we need to support all features present in BigQuery ( Model API / Connection API / INFORMATION SCHEMA etc.. ) in addition to evaluating Google Standard SQL.

However, this project is a personal project and I develop it on my days off and after work. I work full time and maintain a lot of OSS. Therefore, the time available for this project is also limited. Of course, I will be adding features and fixing bugs on a regular basis to get us closer to our goals, but if you want me to implement the features you want, please consider sponsoring me. Of course, you can use this project for free, but if you sponsor me, that will be my motivation. Especially if you are part of a commercial company and could use this project, I'd be glad if you could consider sponsoring me at the same time.

Install

If Go is installed, you can install the latest version with the following command

$ go install github.com/goccy/bigquery-emulator/cmd/bigquery-emulator@latest

The BigQuery emulator depends on go-zetasql. This library takes a very long time to install because it automatically builds the ZetaSQL library during install. It may look like it hangs because it does not log anything during the build process, but if the clang process is running in the background, it is working fine, so just wait it out. Also, for this reason, the following environment variables must be enabled for installation.

CGO_ENABLED=1
CXX=clang++

You can also download the docker image with the following command

$ docker pull ghcr.io/goccy/bigquery-emulator:latest

You can also download the darwin(amd64) and linux(amd64) binaries directly from releases

How to start the standalone server

If you can install the bigquery-emulator CLI, you can start the server using the following options.

$ ./bigquery-emulator -h
Usage:
  bigquery-emulator [OPTIONS]

Application Options:
      --project=        specify the project name
      --dataset=        specify the dataset name
      --port=           specify the http port number. this port used by bigquery api (default: 9050)
      --grpc-port=      specify the grpc port number. this port used by bigquery storage api (default: 9060)
      --log-level=      specify the log level (debug/info/warn/error) (default: error)
      --log-format=     specify the log format (console/json) (default: console)
      --database=       specify the database file if required. if not specified, it will be on memory
      --data-from-yaml= specify the path to the YAML file that contains the initial data
  -v, --version         print version

Help Options:
  -h, --help            Show this help message

Start the server by specifying the project name

$ ./bigquery-emulator --project=test
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060

If you want to use docker image to start emulator, specify like the following.

$ docker run -it ghcr.io/goccy/bigquery-emulator:latest --project=test
  • If you are using an M1 Mac ( and Docker Desktop ) you may get a warning. In that case please use --platform linux/x86_64 option.

How to use from bq client

1. Start the standalone server

$ ./bigquery-emulator --project=test --data-from-yaml=./server/testdata/data.yaml
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060
  • server/testdata/data.yaml is here

2. Call endpoint from bq client

$ bq --api http://0.0.0.0:9050 query --project_id=test "SELECT * FROM dataset1.table_a WHERE id = 1"

+----+-------+---------------------------------------------+------------+----------+---------------------+
| id | name  |                  structarr                  |  birthday  | skillNum |     created_at      |
+----+-------+---------------------------------------------+------------+----------+---------------------+
|  1 | alice | [{"key":"profile","value":"{\"age\": 10}"}] | 2012-01-01 |        3 | 2022-01-01 12:00:00 |
+----+-------+---------------------------------------------+------------+----------+---------------------+

How to use from python client

1. Start the standalone server

$ ./bigquery-emulator --project=test --dataset=dataset1
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060

2. Call endpoint from python client

Create ClientOptions with api_endpoint option and use AnonymousCredentials to disable authentication.

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery
from google.cloud.bigquery import QueryJobConfig

client_options = ClientOptions(api_endpoint="http://0.0.0.0:9050")
client = bigquery.Client(
  "test",
  client_options=client_options,
  credentials=AnonymousCredentials(),
)
client.query(query="...", job_config=QueryJobConfig())

If you use a DataFrame as the download destination for the query results, You must either disable the BigQueryStorage client with create_bqstorage_client=False or create a BigQueryStorage client that references the local grpc port (default 9060).

https://cloud.google.com/bigquery/docs/samples/bigquery-query-results-dataframe?hl=en

result = client.query(sql).to_dataframe(create_bqstorage_client=False)

or

from google.cloud import bigquery_storage

client_options = ClientOptions(api_endpoint="0.0.0.0:9060")
read_client = bigquery_storage.BigQueryReadClient(client_options=client_options)
result = client.query(sql).to_dataframe(bqstorage_client=read_client)

Synopsis

If you use the Go language as a BigQuery client, you can launch the BigQuery emulator on the same process as the testing process.
Please imports github.com/goccy/bigquery-emulator/server ( and github.com/goccy/bigquery-emulator/types ) and you can use server.New API to create the emulator server instance.

See the API reference for more information: https://pkg.go.dev/github.com/goccy/bigquery-emulator

package main

import (
  "context"
  "fmt"

  "cloud.google.com/go/bigquery"
  "github.com/goccy/bigquery-emulator/server"
  "github.com/goccy/bigquery-emulator/types"
  "google.golang.org/api/iterator"
  "google.golang.org/api/option"
)

func main() {
  ctx := context.Background()
  const (
    projectID = "test"
    datasetID = "dataset1"
    routineID = "routine1"
  )
  bqServer, err := server.New(server.TempStorage)
  if err != nil {
    panic(err)
  }
  if err := bqServer.Load(
    server.StructSource(
      types.NewProject(
        projectID,
        types.NewDataset(
          datasetID,
        ),
      ),
    ),
  ); err != nil {
    panic(err)
  }
  if err := bqServer.SetProject(projectID); err != nil {
    panic(err)
  }
  testServer := bqServer.TestServer()
  defer testServer.Close()

  client, err := bigquery.NewClient(
    ctx,
    projectID,
    option.WithEndpoint(testServer.URL),
    option.WithoutAuthentication(),
  )
  if err != nil {
    panic(err)
  }
  defer client.Close()
  routineName, err := client.Dataset(datasetID).Routine(routineID).Identifier(bigquery.StandardSQLID)
  if err != nil {
    panic(err)
  }
  sql := fmt.Sprintf(`
CREATE FUNCTION %s(
  arr ARRAY<STRUCT<name STRING, val INT64>>
) AS (
  (SELECT SUM(IF(elem.name = "foo",elem.val,null)) FROM UNNEST(arr) AS elem)
)`, routineName)
  job, err := client.Query(sql).Run(ctx)
  if err != nil {
    panic(err)
  }
  status, err := job.Wait(ctx)
  if err != nil {
    panic(err)
  }
  if err := status.Err(); err != nil {
    panic(err)
  }

  it, err := client.Query(fmt.Sprintf(`
SELECT %s([
  STRUCT<name STRING, val INT64>("foo", 10),
  STRUCT<name STRING, val INT64>("bar", 40),
  STRUCT<name STRING, val INT64>("foo", 20)
])`, routineName)).Read(ctx)
  if err != nil {
    panic(err)
  }

  var row []bigquery.Value
  if err := it.Next(&row); err != nil {
    if err == iterator.Done {
        return
    }
    panic(err)
  }
  fmt.Println(row[0]) // 30
}

Debugging

If you have specified a database file when starting bigquery-emulator, you can check the status of the database by using the zetasqlite-cli tool. See here for details.

How it works

BigQuery Emulator Architecture Overview

After receiving ZetaSQL Query via REST API from bq or Client SDK for each language, go-zetasqlite parses and analyzes the ZetaSQL Query to output AST. After generating a SQLite query from the AST, go-sqite3 is used to access the SQLite Database.

Type Conversion Flow

BigQuery has a number of types that do not exist in SQLite (e.g. ARRAY and STRUCT). In order to handle them in SQLite, go-zetasqlite encodes all types except INT64 / FLOAT64 / BOOL with the type information and data combination and stores them in SQLite. When using the encoded data, decode the data via a custom function registered with go-sqlite3 before use.

Reference

Regarding the story of bigquery-emulator, there are the following articles.

License

MIT

More Repositories

1

go-json

Fast JSON encoder/decoder compatible with encoding/json for Go
Go
2,348
star
2

go-yaml

YAML support for the Go language
Go
880
star
3

go-graphviz

Go bindings for Graphviz
Go
495
star
4

go-reflect

Zero-allocation reflection library for Go
Go
473
star
5

perl-motion

Perl for iOS and OS X
Objective-C
181
star
6

go-zetasql

Go bindings for ZetaSQL
Go
80
star
7

gperl

fastest perl like language
C++
72
star
8

go-jit

JIT compile library for Go
Go
71
star
9

rebirth

Supports live reloading for Go
Go
67
star
10

go-zetasqlite

A database driver library that interprets ZetaSQL queries and runs them using SQLite3
Go
52
star
11

p5-Compiler-Lexer

Lexical Analyzer for Perl5
Perl
46
star
12

p5-Compiler-CodeGenerator-LLVM

Create LLVM IR for Perl5
C++
40
star
13

kubejob

A library for managing Kubernetes Job in Go
Go
36
star
14

p5-Compiler-Parser

Create Abstract Syntax Tree for Perl5
Perl
33
star
15

p5-Compiler-Tools-CopyPasteDetector

detect Copy and Paste of Perl5 Codes
Perl
29
star
16

go-execbin

Analyze the binary outputted by `go build` to get type information etc.
Go
15
star
17

kubetest

A CLI for distributed execution of tasks on Kubernetes
Go
15
star
18

kpoward

kubernetes port forwarding utility library for Go
Go
10
star
19

p5-Test-AutoGenerator

automatically generate perl test code.
Perl
9
star
20

p5-App-Ikaros

distributed testing framework for jenkins
Perl
8
star
21

go-service-tracer

Visualize the dependencies between Microservices of gRPC methods implemented in Go
Go
7
star
22

iroonga

Groonga for iOS
C
6
star
23

treport

A fast scalable repository scanning tool
Go
5
star
24

go-json-fuzz

fuzzing test for goccy/go-json
Go
5
star
25

RecordKit

Record or stream video from the screen, and audio from the app and microphone
Objective-C
5
star
26

p5-Compiler-Tools-Transpiler

Transpile Perl5 code to JavaScript code
Perl
4
star
27

echo-tools

utility tools for labstack/echo
Go
4
star
28

glisp

lisp based very fast functional language
C
4
star
29

p5-App-Harmonia

generate model layer codes of your application for Parse.com
Perl
4
star
30

go-gcpurl

Parse the URL to get the GCP projectID in Go
Go
3
star
31

gmacs

emacs like editor
C++
3
star
32

cgo-math

Generate libm bridge for resolving undefined symbol in cgo
Go
2
star
33

cgo-multipkg-example

This contains of issues and solutions for binding multi-package libraries with cgo.
C
2
star
34

goccy

1
star
35

FilterGenerator

Automatically generate picture's filter code for iOS and Android.
Objective-C
1
star
36

p5-Compiler-Tools-UselessModuleDetector

detect useless modules
Perl
1
star
37

binarian

BinaryHack library for Gopher
Go
1
star
38

zetasql-proto

ZetaSQL Protocol Buffers
1
star
39

picoredis

header only redis client
C
1
star
40

earth-cupsule

Convert OpenStreetMap data around the world ( over 1TB ) to portable data
Go
1
star
41

PhotoFilterProcessor

generates photo filter data using CIFilter for iOS
Objective-C
1
star
42

nopbx

Provides method of removing project.pbxproj from your project. Also, release from conflict of project file.
Ruby
1
star
43

go-wasmbind-tools

A variety of tools for Go's wasm binding
Go
1
star