Get a free observability report to evaluate the potential savingsContact us →
Cost Analysis3 min read

BigQuery Top Queries by Frequency

Repeated queries are a hidden cost multiplier in BigQuery. This query groups identical query texts and counts how many times each has been executed, along with the cumulative cost of all those executions.

Why This Matters

A query that costs $0.10 per execution but runs 10,000 times per month costs $1,000. Frequency analysis often reveals low-cost queries that add up to significant spending due to sheer repetition. These are prime candidates for caching, scheduled materialization, or BI Engine acceleration.

How It Works

The query groups each unique query text, then counts occurrences and sums total_bytes_billed per group. The result shows each unique query with its execution count and cumulative on-demand cost.

SQL Query

Fill in your details to get a ready-to-run query:

SQL
-- Find the most frequently repeated queries and their cumulative cost

DECLARE lookback_days INT64 DEFAULT 14;

WITH jobs AS (
  SELECT
    query,
    COALESCE(total_bytes_billed, 0) AS bytes_billed,
    ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY end_time DESC) AS rn
  FROM `your-project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL lookback_days DAY)
    AND job_type = 'QUERY' AND state = 'DONE' AND total_slot_ms IS NOT NULL
),
deduplicated AS (
  SELECT query, bytes_billed FROM jobs WHERE rn = 1
),
grouped AS (
  SELECT
    query,
    COUNT(*) AS exec_count,
    SUM(bytes_billed) AS total_bytes
  FROM deduplicated
  GROUP BY query
)
SELECT
  query,
  exec_count,
  ROUND(total_bytes / POW(1024, 3), 2) AS total_gib_scanned,
  ROUND(total_bytes / POW(1024, 4) * 6.25, 2) AS cumulative_cost_usd
FROM grouped
...
Replace your-project and region-us with your GCP project and dataset region.

Query Explanation

Jobs are deduplicated by job_id first, then grouped by query text. The count gives execution frequency, and the sum of bytes billed gives cumulative cost. This approach efficiently surfaces queries that are run repeatedly at scale.

Key Insights

  • lightbulb

    Queries running >100 times per day are almost certainly automated — check if they can use cached results.

  • lightbulb

    BigQuery caches results for 24 hours by default, but only for identical queries with deterministic results.

  • lightbulb

    Queries with CURRENT_TIMESTAMP() or NOW() bypass the cache — consider using date truncation instead.

  • lightbulb

    High-frequency queries scanning the same tables may benefit from materialized views or BI Engine.

Best Practices

  1. 1

    Enable BigQuery result caching and avoid patterns that defeat it (CURRENT_TIMESTAMP, RAND, etc.).

  2. 2

    Create materialized views for the top 5 most frequently executed analytical queries.

  3. 3

    Consolidate duplicate queries from different dashboards or services into a single scheduled query.

  4. 4

    Use BI Engine for sub-second queries powering interactive dashboards.

Want CloudClerk to find these savings automatically?

Our platform connects to your BigQuery project, runs these analyses automatically, and delivers AI-powered optimisation recommendations — all with your data fully anonymised.

Related Guides