Get a free observability report to evaluate the potential savingsContact us →
Cost Analysis3 min read

BigQuery Cost Analysis by Label

BigQuery job labels let you tag queries with arbitrary key-value metadata — team names, pipeline IDs, feature flags, or environment tags. This query aggregates costs by label so you can attribute spending to the business dimension that matters most to your organisation.

Why This Matters

User-level cost attribution only goes so far. In modern data stacks, a single service account might run queries on behalf of dozens of teams or pipelines. Labels bridge this gap by letting you tag jobs with business context (e.g., team=analytics, pipeline=revenue_report) and then slice costs accordingly. Without label-based attribution you are flying blind on where your money actually goes.

How It Works

BigQuery stores labels as a repeated STRUCT field on each job. This query CROSS JOINs the labels array, deduplicates jobs, and then aggregates total_bytes_billed by each label key-value pair. The result is an on-demand cost estimate per label combination.

SQL Query

Fill in your details to get a ready-to-run query:

SQL
-- Break down on-demand cost by job label

DECLARE lookback_days INT64 DEFAULT 30;

WITH flattened AS (
  SELECT
    job_id,
    lbl.key AS label_key,
    lbl.value AS label_value,
    COALESCE(total_bytes_billed, 0) AS bytes_billed,
    ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY end_time DESC) AS rn
  FROM `your-project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  CROSS JOIN UNNEST(labels) AS lbl
  WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL lookback_days DAY)
    AND state = 'DONE'
    AND total_slot_ms IS NOT NULL
    AND lbl.key IS NOT NULL
),
deduplicated AS (
  SELECT label_key, label_value, bytes_billed
  FROM flattened WHERE rn = 1
)
SELECT
  label_key,
  label_value,
  COUNT(*) AS job_count,
  ROUND(SUM(bytes_billed) / POW(1024, 3), 2) AS gib_scanned,
  ROUND(SUM(bytes_billed) / POW(1024, 4) * 6.25, 2) AS estimated_cost_usd
FROM deduplicated
GROUP BY label_key, label_value
...
Replace your-project and region-us with your GCP project and dataset region.

Query Explanation

The query CROSS JOINs the labels repeated field to flatten one row per label per job. It deduplicates with ROW_NUMBER() on job_id, then groups by label_key and label_value to sum total_bytes_billed. The on-demand cost is calculated at $6.25 per TiB.

Key Insights

  • lightbulb

    If most of your jobs lack labels, this query will only show a fraction of your costs — start by labelling your most expensive pipelines.

  • lightbulb

    Compare label-level costs month-over-month to detect cost regressions introduced by new features or pipeline changes.

  • lightbulb

    Labels like 'environment=production' vs 'environment=staging' can reveal if non-production workloads are costing more than expected.

  • lightbulb

    dbt automatically adds labels with the model name, making this query especially powerful for dbt shops.

Best Practices

  1. 1

    Enforce a labelling policy: require at least a 'team' and 'pipeline' label on all scheduled jobs.

  2. 2

    Use BigQuery's default query labels in your orchestration tool (Airflow, dbt, Dataform) to auto-tag every job.

  3. 3

    Create a dashboard that tracks label-level costs over time to catch regressions early.

  4. 4

    Alert on labels whose weekly cost exceeds a threshold.

Want CloudClerk to find these savings automatically?

Our platform connects to your BigQuery project, runs these analyses automatically, and delivers AI-powered optimisation recommendations — all with your data fully anonymised.

Related Guides