• Stars
    star
    190
  • Rank 203,739 (Top 5 %)
  • Language HCL
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Creates opinionated BigQuery datasets and tables

terraform-google-bigquery

This module allows you to create opinionated Google Cloud Platform BigQuery datasets and tables. This will allow the user to programmatically create an empty table schema inside of a dataset, ready for loading. Additional user accounts and permissions are necessary to begin querying the newly created table(s).

Compatibility

This module is meant for use with Terraform 0.13+ and tested using Terraform 1.0+. If you find incompatibilities using Terraform >=0.13, please open an issue. If you haven't upgraded and need a Terraform 0.12.x-compatible version of this module, the last released version intended for Terraform 0.12.x is v4.5.0.

Upgrading

The current version is 4.X. The following guides are available to assist with upgrades:

Usage

Basic usage of this module is as follows:

module "bigquery" {
  source  = "terraform-google-modules/bigquery/google"
  version = "~> 6.1"

  dataset_id                  = "foo"
  dataset_name                = "foo"
  description                 = "some description"
  project_id                  = "<PROJECT ID>"
  location                    = "US"
  default_table_expiration_ms = 3600000

  tables = [
  {
    table_id           = "foo",
    schema             =  "<SCHEMA JSON DATA>",
    time_partitioning  = {
      type                     = "DAY",
      field                    = null,
      require_partition_filter = false,
      expiration_ms            = null,
    },
    range_partitioning = null,
    expiration_time = null,
    clustering      = ["fullVisitorId", "visitId"],
    labels          = {
      env      = "dev"
      billable = "true"
      owner    = "joedoe"
    },
  },
  {
    table_id           = "bar",
    schema             =  "<SCHEMA JSON DATA>",
    time_partitioning  = null,
    range_partitioning = {
      field = "customer_id",
      range = {
        start    = "1"
        end      = "100",
        interval = "10",
      },
    },
    expiration_time    = 2524604400000, # 2050/01/01
    clustering         = [],
    labels = {
      env      = "devops"
      billable = "true"
      owner    = "joedoe"
    }
  }
  ],

  views = [
    {
      view_id    = "barview",
      use_legacy_sql = false,
      query          = <<EOF
      SELECT
       column_a,
       column_b,
      FROM
        `project_id.dataset_id.table_id`
      WHERE
        approved_user = SESSION_USER
      EOF,
      labels = {
        env      = "devops"
        billable = "true"
        owner    = "joedoe"
      }
    }
  ]
  dataset_labels = {
    env      = "dev"
    billable = "true"
  }
}

Functional examples are included in the examples directory.

Variable tables detailed description

The tables variable should be provided as a list of object with the following keys:

{
  table_id = "some_id"                        # Unique table id (will be used as ID for table).
  table_name = "Friendly Name"                # Optional friendly name for table. If not set, the "table_id" will be used by default.
  schema = file("path/to/schema.json")        # Schema as JSON string.
  time_partitioning = {                       # Set it to `null` to omit partitioning configuration for the table.
        type                     = "DAY",     # The only type supported is DAY, which will generate one partition per day based on data loading time.
        field                    = null,      # The field used to determine how to create a time-based partition. If time-based partitioning is enabled without this value, the table is partitioned based on the load time. Set it to `null` to omit configuration.
        require_partition_filter = false,     # If set to true, queries over this table require a partition filter that can be used for partition elimination to be specified. Set it to `null` to omit configuration.
        expiration_ms            = null,      # Number of milliseconds for which to keep the storage for a partition.
      },
  range_partitioning = {                      # Set it to `null` to omit partitioning configuration for the table.
    field = "integer_column",                 # The column used to create the integer range partitions.
    range = {
      start    = "1"                          # The start of range partitioning, inclusive.
      end      = "100",                       # The end of range partitioning, exclusive.
      interval = "10",                        # The width of each range within the partition.
        },
      },
  clustering = ["fullVisitorId", "visitId"]   # Specifies column names to use for data clustering. Up to four top-level columns are allowed, and should be specified in descending priority order. Partitioning should be configured in order to use clustering.
  expiration_time = 2524604400000             # The time when this table expires, in milliseconds since the epoch. If set to `null`, the table will persist indefinitely.
  labels = {                                  # A mapping of labels to assign to the table.
      env      = "dev"
      billable = "true"
    }
}

Variable views detailed description

The views variable should be provided as a list of object with the following keys:

{
  view_id = "some_id"                                                # Unique view id. it will be set to friendly name as well
  query = "Select user_id, name from `project_id.dataset_id.table`"  # the Select query that will create the view. Tables should be created before.
  use_legacy_sql = false                                             # whether to use legacy sql or standard sql
  labels = {                                                         # A mapping of labels to assign to the view.
      env      = "dev"
      billable = "true"
  }
}

Variable routines detailed description

The routines variable should be provided as a list of object with the following keys:

{
  routine_id = "some_id"                     # The ID of the routine. The ID must contain only letters, numbers, or underscores. The maximum length is 256 characters.
  routine_type = "PROCEDURE"                 # The type of routine. Possible values are SCALAR_FUNCTION and PROCEDURE.
  language = "SQL"                           # The language of the routine. Possible values are SQL and JAVASCRIPT.
  definition_body = "CREATE FUNCTION test return x*y;"  # The body of the routine. For functions, this is the expression in the AS clause. If language=SQL, it is the substring inside (but excluding) the parentheses.
  return_type     = null                     # A JSON schema for the return type. Optional if language = "SQL"; required otherwise. If absent, the return type is inferred from definitionBody at query time in each query that references this routine. If present, then the evaluated result will be cast to the specified returned type at query time.
  description = "Description"               # The description of the routine if defined.
  arguments = [                             # Set it to `null` to omit arguments block configuration for the routine.
    {
      name      = "x",                      # The name of this argument. Can be absent for function return argument.
      data_type = null,                     # A JSON schema for the data type. Required unless argumentKind = ANY_TYPE.
      argument_kind = "ANY_TYPE"            # Defaults to FIXED_TYPE. Default value is FIXED_TYPE. Possible values are FIXED_TYPE and ANY_TYPE.
      mode = null                           # Specifies whether the argument is input or output. Can be set for procedures only. Possible values are IN, OUT, and INOUT.
    }
  ]
}

A detailed example with authorized views can be found here.

Features

This module provisions a dataset and a list of tables with associated JSON schemas and views from queries.

Inputs

Name Description Type Default Required
access An array of objects that define dataset access for one or more entities. any
[
{
"role": "roles/bigquery.dataOwner",
"special_group": "projectOwners"
}
]
no
dataset_id Unique ID for the dataset being provisioned. string n/a yes
dataset_labels Key value pairs in a map for dataset labels map(string) {} no
dataset_name Friendly name for the dataset being provisioned. string null no
default_table_expiration_ms TTL of tables using the dataset in MS number null no
delete_contents_on_destroy (Optional) If set to true, delete all the tables in the dataset when destroying the resource; otherwise, destroying the resource will fail if tables are present. bool null no
deletion_protection Whether or not to allow Terraform to destroy the instance. Unless this field is set to false in Terraform state, a terraform destroy or terraform apply that would delete the instance will fail bool false no
description Dataset description. string null no
encryption_key Default encryption key to apply to the dataset. Defaults to null (Google-managed). string null no
external_tables A list of objects which include table_id, expiration_time, external_data_configuration, and labels.
list(object({
table_id = string,
description = optional(string),
autodetect = bool,
compression = string,
ignore_unknown_values = bool,
max_bad_records = number,
schema = string,
source_format = string,
source_uris = list(string),
csv_options = object({
quote = string,
allow_jagged_rows = bool,
allow_quoted_newlines = bool,
encoding = string,
field_delimiter = string,
skip_leading_rows = number,
}),
google_sheets_options = object({
range = string,
skip_leading_rows = number,
}),
hive_partitioning_options = object({
mode = string,
source_uri_prefix = string,
}),
expiration_time = string,
labels = map(string),
}))
[] no
location The regional location for the dataset only US and EU are allowed in module string "US" no
materialized_views A list of objects which includes view_id, view_query, clustering, time_partitioning, range_partitioning, expiration_time and labels
list(object({
view_id = string,
description = optional(string),
query = string,
enable_refresh = bool,
refresh_interval_ms = string,
clustering = list(string),
time_partitioning = object({
expiration_ms = string,
field = string,
type = string,
require_partition_filter = bool,
}),
range_partitioning = object({
field = string,
range = object({
start = string,
end = string,
interval = string,
}),
}),
expiration_time = string,
labels = map(string),
}))
[] no
max_time_travel_hours Defines the time travel window in hours number null no
project_id Project where the dataset and table are created string n/a yes
routines A list of objects which include routine_id, routine_type, routine_language, definition_body, return_type, routine_description and arguments.
list(object({
routine_id = string,
routine_type = string,
language = string,
definition_body = string,
return_type = string,
description = string,
arguments = list(object({
name = string,
data_type = string,
argument_kind = string,
mode = string,
})),
}))
[] no
tables A list of objects which include table_id, table_name, schema, clustering, time_partitioning, range_partitioning, expiration_time and labels.
list(object({
table_id = string,
description = optional(string),
table_name = optional(string),
schema = string,
clustering = list(string),
time_partitioning = object({
expiration_ms = string,
field = string,
type = string,
require_partition_filter = bool,
}),
range_partitioning = object({
field = string,
range = object({
start = string,
end = string,
interval = string,
}),
}),
expiration_time = string,
labels = map(string),
}))
[] no
views A list of objects which include view_id and view query
list(object({
view_id = string,
description = optional(string),
query = string,
use_legacy_sql = bool,
labels = map(string),
}))
[] no

Outputs

Name Description
bigquery_dataset Bigquery dataset resource.
bigquery_external_tables Map of BigQuery external table resources being provisioned.
bigquery_tables Map of bigquery table resources being provisioned.
bigquery_views Map of bigquery view resources being provisioned.
external_table_ids Unique IDs for any external tables being provisioned
external_table_names Friendly names for any external tables being provisioned
project Project where the dataset and tables are created
routine_ids Unique IDs for any routine being provisioned
table_ids Unique id for the table being provisioned
table_names Friendly name for the table being provisioned
view_ids Unique id for the view being provisioned
view_names friendlyname for the view being provisioned

Requirements

These sections describe requirements for using this module.

Software

The following dependencies must be available:

Service Account

A service account with the following roles must be used to provision the resources of this module:

  • BigQuery Data Owner: roles/bigquery.dataOwner

The Project Factory module and the IAM module may be used in combination to provision a service account with the necessary roles applied.

Script Helper

A helper script for configuring a Service Account is located at (./helpers/setup-sa.sh).

APIs

A project with the following APIs enabled must be used to host the resources of this module:

  • BigQuery JSON API: bigquery-json.googleapis.com

The Project Factory module can be used to provision a project with the necessary APIs enabled.

Contributing

Refer to the contribution guidelines for information on contributing to this module.

More Repositories

1

terraform-example-foundation

Shows how the CFT modules can be composed to build a secure cloud foundation
HCL
1,211
star
2

terraform-google-kubernetes-engine

Configures opinionated GKE clusters
HCL
1,131
star
3

terraform-google-project-factory

Creates an opinionated Google Cloud project by using Shared VPC, IAM, and Google Cloud APIs
HCL
826
star
4

terraform-google-network

Sets up a new VPC network on Google Cloud
HCL
411
star
5

terraform-google-lb-http

Creates a global HTTP load balancer for Compute Engine by using forwarding rules
HCL
315
star
6

terraform-docs-samples

Terraform samples intended for inclusion in cloud.google.com
HCL
290
star
7

terraform-google-sql-db

Creates a Cloud SQL database instance
HCL
263
star
8

terraform-google-vm

Provisions VMs in Google Cloud
HCL
220
star
9

terraform-google-bootstrap

Bootstraps Terraform usage and related CI/CD in a new Google Cloud organization
HCL
210
star
10

terraform-google-vault

Deploys Vault on Compute Engine
HCL
192
star
11

terraform-google-iam

Manages multiple IAM roles for resources on Google Cloud
HCL
189
star
12

terraform-google-github-actions-runners

Creates self-hosted GitHub Actions Runners on Google Cloud
HCL
181
star
13

terraform-google-cloud-storage

Creates one or more Cloud Storage buckets and assigns basic permissions on them to arbitrary users
HCL
168
star
14

terraform-google-container-vm

Deploys containers on Compute Engine instances
HCL
155
star
15

terraform-google-gcloud

Executes Google Cloud CLI commands within Terraform
HCL
138
star
16

terraform-google-bastion-host

Generates a bastion host VM compatible with OS Login and IAP Tunneling that can be used to access internal VMs
HCL
126
star
17

terraform-google-service-accounts

Creates one or more service accounts and grants them basic roles
HCL
115
star
18

docs-examples

Open in Cloud Shell Examples for the Google provider docs
HCL
110
star
19

cloud-foundation-training

HCL
96
star
20

terraform-google-lb

Creates a regional TCP proxy load balancer for Compute Engine by using target pools and forwarding rules
HCL
92
star
21

terraform-google-gke-gitlab

Installs GitLab on Kubernetes Engine
HCL
90
star
22

terraform-google-vpn

Sets up a Cloud VPN gateway
HCL
88
star
23

terraform-google-log-export

Creates log exports at the project, folder, or organization level
HCL
88
star
24

terraform-google-pubsub

Creates Pub/Sub topic and subscriptions associated with the topic
HCL
85
star
25

terraform-google-lb-internal

Creates an internal load balancer for Compute Engine by using forwarding rules
HCL
81
star
26

terraform-google-org-policy

Manages Google Cloud organization policies
HCL
80
star
27

terraform-google-cloud-nat

Creates and configures Cloud NAT
HCL
80
star
28

terraform-google-startup-scripts

Provides a library of useful startup scripts to embed in VMs
Shell
73
star
29

terraform-google-k8s-gce

Modular Kubernetes Cluster for GCE using Terraform
HCL
71
star
30

terraform-google-scheduled-function

Sets up a scheduled job to trigger events and run functions
Go
71
star
31

terraform-google-slo

Creates SLOs on Google Cloud from custom Stackdriver metrics capability to export SLOs to Google Cloud services and other systems
HCL
63
star
32

terraform-google-address

Manages Google Cloud IP addresses
Shell
60
star
33

terraform-google-vpc-service-controls

Handles opinionated VPC Service Controls and Access Context Manager configuration and deployments
HCL
60
star
34

terraform-google-cloud-dns

Creates and manages Cloud DNS public or private zones and their records
HCL
57
star
35

terraform-google-event-function

Responds to logging events with a Cloud Function
HCL
52
star
36

terraform-google-composer

Manages Cloud Composer v1 and v2 along with option to manage networking
HCL
52
star
37

terraform-google-module-template

Provides a template for creating a Cloud Foundation Toolkit Terraform module
HCL
52
star
38

terraform-google-cloud-router

Manages a Cloud Router on Google Cloud
HCL
48
star
39

terraform-google-folders

Creates several Google Cloud folders under the same parent
HCL
47
star
40

terraform-google-cloud-operations

Manages Cloud Logging and Cloud Monitoring
HCL
47
star
41

terraform-google-kms

Allows managing a keyring, zero or more keys in the keyring, and IAM role bindings on individual keys
HCL
44
star
42

terraform-google-memorystore

Creates a fully functional Google Memorystore (redis) instance
HCL
43
star
43

terraform-google-group

Manages Google Groups
HCL
40
star
44

terraform-google-dataflow

Handles opinionated Dataflow job configuration and deployments
HCL
34
star
45

terraform-google-jenkins

Creates a Compute Engine instance running Jenkins
HCL
31
star
46

terraform-google-sap

Deploys SAP products
HCL
31
star
47

terraform-google-cloud-datastore

Manages Datastore
HCL
22
star
48

terraform-google-gsuite-export

Creates a Compute Engine VM instance and sets up a cronjob to export GSuite Admin SDK data to Cloud Logging on a schedule
HCL
18
star
49

terraform-google-utils

Gets the short names for a given Google Cloud region
HCL
14
star
50

terraform-google-data-fusion

Manages Cloud Data Fusion
HCL
14
star
51

terraform-google-endpoints-dns

This module creates a DNS record on the .cloud.goog domain using Cloud Endpoints.
HCL
11
star
52

terraform-google-healthcare

Handles opinionated Google Cloud Healthcare datasets and stores
HCL
11
star
53

terraform-google-migrate

Terraform module to help with migrating VMs to GCP.
HCL
10
star
54

terraform-example-shared-services

Example of using CFT to build a Shared Services architecture on GCP
HCL
6
star
55

terraform-google-datalab

Creates DataLab instances with support for GPU instances
HCL
6
star
56

terraform-google-secret

This Terraform module makes it easier to manage to manage secrets for your Google Cloud environment, such as api keys, tokens, etc.
Python
6
star
57

terraform-google-redis

HCL
5
star
58

terraform-google-airflow

HCL
4
star
59

terraform-google-api-police

HCL
3
star
60

.allstar

1
star
61

terraform-google-mariadb

HCL
1
star