User-defined functions (UDFs) are a feature of SQL supported by BigQuery that enables a user to create a function using another SQL expression or JavaScript. These functions accept columns of input and perform actions, returning the result of those actions as a value.
The community folder contains community-contributed functions that perform some actions in BigQuery. The migration folder contains sub-folders such as teradata, redshift, and oracle which contain community-contributed functions that replicate the behavior of proprietary functions in other data warehouses. These functions can help you achieve feature parity in a migration from another data warehouse to BigQuery.
All UDFs within this repository are available under the bqutil
project on
publicly shared datasets. Queries can then reference the shared UDFs in the US multi-region via
bqutil.<dataset>.<function>()
.
UDFs within this repository are also deployed publicly into every other region that BigQuery supports. In order to use a UDF in your desired location outside of the US multi-region, you can reference it via a dataset with a regional suffix:
bqutil.<dataset>_<region>.<function>()
.
For example, the Teradata nullifzero
can be referenced in various locations:
bqutil.td_eu.nullifzero() ## eu multi-region
bqutil.td_europe_west1.nullifzero() ## europe-west1 region
bqutil.td_asia_south1.nullifzero() ## asia-south1 region
Note: Region suffixes added to dataset names replace -
with _
in order to comply with BigQuery dataset naming rules.
All UDFs within this repository are maintained in SQLX format. This format is
used to enable testing and deployment of the UDFs with
the Dataform CLI tool.
The Dataform CLI is a useful tool for deploying the UDFs because it:
- Enables unit testing the UDFs
- Automatically identifies dependencies between UDFs and then creates them in the correct order.
- Easily deploys the UDFs across different environments (dev, test, prod)
The following sections cover a few methods of deploying the UDFs.
-
Authenticate using the Cloud SDK and set the BigQuery project in which you'll deploy your UDF(s):
gcloud init
-
Enable the Cloud Build API and grant the default Cloud Build service account the BigQuery Job User and Data Editor roles
gcloud services enable cloudbuild.googleapis.com && \ gcloud projects add-iam-policy-binding \ $(gcloud config get-value project) \ --member=serviceAccount:$(gcloud projects describe $(gcloud config get-value project) --format="value(projectNumber)")"@cloudbuild.gserviceaccount.com" \ --role=roles/bigquery.user && \ gcloud projects add-iam-policy-binding \ $(gcloud config get-value project) \ --member=serviceAccount:$(gcloud projects describe $(gcloud config get-value project) --format="value(projectNumber)")"@cloudbuild.gserviceaccount.com" \ --role=roles/bigquery.dataEditor
-
Deploy the UDFs by submitting the following:
# Deploy to US gcloud builds submit . --config=deploy.yaml --substitutions _PROJECT_ID=YOUR_PROJECT_ID,_BQ_LOCATION=US
Note: Deploy to a different location by setting
_BQ_LOCATION
to your own desired value.
Click here for a list of supported locations.
Run the following in your machine's terminal to deploy all UDFs in your own BigQuery project.
-
Authenticate using the Cloud SDK and set the BigQuery project in which you'll deploy your UDF(s):
gcloud init
-
Install the dataform CLI tool:
npm i -g @dataform/cli
-
Set env variable BQ_LOCATION to the BigQuery location in which you want to deploy the UDFs and then run the
deploy.sh
helper script to deploy the UDFs:# Deploy to US export BQ_LOCATION=US && bash deploy.sh
Note: Deploy to a different location by setting
BQ_LOCATION
to your own desired value.
Click here for a list of supported locations.
If you want to create the UDFs from this repository using the bq command-line tool, then you must make a few modifications to the SQLX files as shown below:
- Remove the first line
config { hasOutput: true }
in each SQLX file. - Replace any instance of
${self()}
with the fully qualified UDF name. - Replace any instance of
${ref(SOME_UDF_NAME)}
with the fully qualified UDF name ofSOME_UDF_NAME
. - Deploy the UDF using either of the following:
- bq command-line tool:
bq query --nouse_legacy_sql < UDF_SQL_FILE_NAME.sqlx`
- BigQuery Console: Just paste the SQL UDF body in the console and execute.
- bq command-line tool:
When creating JavaScript UDFs in your dataset, you need both to create the UDF and optionally copy the javascript library to your own Google Storage Bucket.
The base route for all the compiled JS libraries
is gs://bqutil-lib/bq_js_libs/
.
In the following example, we show how to create in your dataset the Levenshtein
UDF function, that uses the js-levenshtein-v1.1.6.js
library.
- Copy the compiled library to your bucket:
gsutil cp gs://bqutil-lib/bq_js_libs/js-levenshtein-v1.1.6.js gs://your-bucket
- Give permissions to the library. First, if you don't
have uniform bucket-level access
in your bucket, enable
it:
gsutil uniformbucketlevelaccess set on gs://your-bucket
. Once done give the Cloud Storage Object Viewer role at the bucket or project level to a user or group:gsutil iam ch [user|group]:[user|group]@domain.com:roles/storage.objectViewer gs://your_bucket
- Edit the levenshtein.sql SQL file and replace
the library path
library="${JS_BUCKET}/js-levenshtein-v1.1.6.js"
with your own pathlibrary="gs://your-bucket/js-levenshtein-v1.1.6.js
- Create the SQL UDF passing the previously modified SQL file:
bq query --project_id YOUR_PROJECT_ID --dataset_id YOUR_DATASET_ID --nouse_legacy_sql < levenshtein.sql
If you are interested in contributing UDFs to this repository, please see the instructions to get started.