Cost Analysis3 min read

BigQuery Cost Analysis by Label

BigQuery job labels let you tag queries with arbitrary key-value metadata — team names, pipeline IDs, feature flags, or environment tags. This query aggregates costs by label so you can attribute spending to the business dimension that matters most to your organisation.

Why This Matters

User-level cost attribution only goes so far. In modern data stacks, a single service account might run queries on behalf of dozens of teams or pipelines. Labels bridge this gap by letting you tag jobs with business context (e.g., team=analytics, pipeline=revenue_report) and then slice costs accordingly. Without label-based attribution you are flying blind on where your money actually goes.

How It Works

BigQuery stores labels as a repeated STRUCT field on each job. This query CROSS JOINs the labels array, deduplicates jobs, and then aggregates total_bytes_billed by each label key-value pair. The result is an on-demand cost estimate per label combination.

SQL Query

Fill in your details to get a ready-to-run query:

GCP Project ID

Dataset Region

Lookback (days)

SQL

-- Break down on-demand cost by job label

DECLARE lookback_days INT64 DEFAULT 30;

WITH flattened AS (
  SELECT
    job_id,
    lbl.key AS label_key,
    lbl.value AS label_value,
    COALESCE(total_bytes_billed, 0) AS bytes_billed,
    ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY end_time DESC) AS rn
  FROM `your-project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  CROSS JOIN UNNEST(labels) AS lbl
  WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL lookback_days DAY)
    AND state = 'DONE'
    AND total_slot_ms IS NOT NULL
    AND lbl.key IS NOT NULL
),
deduplicated AS (
  SELECT label_key, label_value, bytes_billed
  FROM flattened WHERE rn = 1
)
SELECT
  label_key,
  label_value,
  COUNT(*) AS job_count,
  ROUND(SUM(bytes_billed) / POW(1024, 3), 2) AS gib_scanned,
  ROUND(SUM(bytes_billed) / POW(1024, 4) * 6.25, 2) AS estimated_cost_usd
FROM deduplicated
GROUP BY label_key, label_value
...

Replace your-project and region-us with your GCP project and dataset region.

Query Explanation

The query CROSS JOINs the labels repeated field to flatten one row per label per job. It deduplicates with ROW_NUMBER() on job_id, then groups by label_key and label_value to sum total_bytes_billed. The on-demand cost is calculated at $6.25 per TiB.

Key Insights

lightbulb
If most of your jobs lack labels, this query will only show a fraction of your costs — start by labelling your most expensive pipelines.
lightbulb
Compare label-level costs month-over-month to detect cost regressions introduced by new features or pipeline changes.
lightbulb
Labels like 'environment=production' vs 'environment=staging' can reveal if non-production workloads are costing more than expected.
lightbulb
dbt automatically adds labels with the model name, making this query especially powerful for dbt shops.

Best Practices

1
Enforce a labelling policy: require at least a 'team' and 'pipeline' label on all scheduled jobs.
2
Use BigQuery's default query labels in your orchestration tool (Airflow, dbt, Dataform) to auto-tag every job.
3
Create a dashboard that tracks label-level costs over time to catch regressions early.
4
Alert on labels whose weekly cost exceeds a threshold.

Want CloudClerk to find these savings automatically?

Our platform connects to your BigQuery project, runs these analyses automatically, and delivers AI-powered optimisation recommendations — all with your data fully anonymised.

Get Your Free Report Access the platform →

Related Guides

personCost Analysis

BigQuery Cost Analysis by User

Find which users drive the most BigQuery spending. Query INFORMATION_SCHEMA to rank users by on-demand cost and identify optimization targets.

Read guide

folderCost Analysis

BigQuery Cost Analysis by Project

Track BigQuery costs across GCP projects. Use INFORMATION_SCHEMA to compare project-level spending and find the most expensive projects in your organization.

Read guide

trending_upCost Analysis

BigQuery Top Queries by Cost

Find the most expensive BigQuery queries by on-demand cost. Rank queries by total bytes billed to identify the biggest cost drivers in your project.

Read guide

codeCost Analysis

BigQuery Cost Analysis by Query

Analyze every individual query's cost, slot usage, and execution time. Get a detailed breakdown of on-demand cost per query execution.

Read guide