We are currently developing a solution that uses Google’s BigQuery for data analytics.
As part of this solution which is written in Golang, we need to perform some complex SQL queries (hundreds of lines each). These queries implement business-logic which is fundamental to our solution, and, as an integral part of our code, they undergo a standard development cycle of experimentation and on-going enhancement.
As the queries evolved, we started running into bugs which were time-consuming to debug and fix, and we realized it is impossible to continue without proper testing, just like any other part of the code.
We posted this question about Testable SQL for BigQuery on reddit were you can see some of the solutions that people proposed, but we decided to publish our own solution (diversity is good, right?).
So we created Espresso, a framework for writing testable BigQuery queries. The idea is to compose complex queries from simpler ones and to be able to write unit tests for each component and for their combinations. The tests can be run against data-as-code (data which is specified as part of the code and controlled by git) or data from the database.
We compose SQL queries by as follows:
WITH base AS (
SELECT
"orange" AS fruit
UNION ALL
SELECT
"apple"
)
SELECT fruit FROM base
This query has two parts, a common table expression (CTE), highlighted in Bold, and the final SELECT fruit FROM base
, referencing the CTE.
We write is as a template:
{{ define "fruit" }}WITH base AS (
{{ .Base }}
)
SELECT fruit FROM base{{ end }}
Where Base is:
{{ define "base" }}SELECT
"orange" AS fruit
UNION ALL
SELECT
"apple"{{ end }}
Note: we’re using Golang template notation.
And now we write a query definition using YAML:
Name: fruit
Requires:
- Base
Tests:
Test1:
Args:
- Name: Base
Source: base
Result:
Source: fruit_result
This tells espresso to load the fruit query and inject the base query into it. The query is then run and the result is compared to the expected result: fruit_result, which, in this simplified case, is identical to Base.
The tests can be embedded into Golang standard unit tests:
//go:embed queries/fruitvar templates embed.FSfunc TestEspressoShot_Embed(t *testing.T) { s := shot.NewShotWithClient(env.GetGCPProjectID(), "", templates)
params := []bigquery.QueryParameter{}
t := map[string]bigquery.Value{} result, expected, err := s.RunTest("fruit", "Test1", params, &t) require.NoError(t, err)
require.ElementsMatch(t, result, expected)
}
Or, from the command-line:
./espresso -dir="./shot/queries/fruit/" -query="fruit" -test="Test1"