databricks-cost-leak-hunter
sample output
source-cited
design review · #790
A $100K/mo workspace is likely burning ~$27,000/month
that's ~$324K/year. The $100K/mo spend is the only assumed input; the 27% waste rate is published.1 Every line below is one config change.
| # | Where it's leaking | $/month | The fix |
|---|---|---|---|
| 1 | Clusters that never auto-terminateIdle compute is one of the largest cloud-waste categories; utilization is chronically low39 | $12,000 | Set auto-termination to 30 min |
| 2 | Scheduled jobs on All-Purpose ComputeBilled at $0.55/DBU vs $0.15/DBU for Jobs Compute — 2–3× more for the same work456 | $7,000 | Switch job clusters to Jobs Compute |
| 3 | Clusters sized for peak, idling below thresholdProduction is typically overprovisioned 30–50%310 | $5,000 | Turn on autoscaling, drop the floor |
| 4 | Photon billed at ~2× DBU on jobs it doesn't accelerateThe premium only pays off at a ≥2× speedup76 | $3,000 | Disable Photon where it adds no runtime gain |
The #1 line alone — auto-termination — is ~$144K/year, fixed in one setting.
For scale: Nucleus Research independently measured a 375% ROI / 6-month payback for one Databricks customer8 — getting the platform's cost posture right has real, audited upside.
What's assumed, what's cited. The $100K/month workspace spend is the only assumed input — your number goes here. The 27% waste rate, the $0.55-vs-$0.15 rate gap and 2–3× multiplier, the ~2× Photon premium, and the idle-and-overprovisioned-dominate ranking are all from published sources (numbered below). The per-row dollar split is an illustrative allocation of the $27K, ranked by documented waste-category size. When the skill runs, every dollar figure is computed from the customer's own system.billing.usage table — never estimated.
Sources
- Flexera, 2025 State of the Cloud Report — respondents estimate 27% of cloud spend is wasted, and 84% say managing cloud spend is the top cloud challenge. Press release · Report PDF
- CloudZero — Reduce Cloud Waste — cloud waste runs ~32% of spend (up to one-third), over $200B globally, corroborating Flexera. cloudzero.com/blog/cloud-waste
- CloudZero — Cloud Rightsizing — overprovisioned and idle resources are the largest waste categories: production is typically overprovisioned 30–50% (non-production 70%+), and on Kubernetes average CPU utilization is ~10% — "90 cents of every dollar spent on Kubernetes compute buys idle capacity." cloudzero.com/blog/cloud-rightsizing
- Flexera, Databricks pricing guide (2026) — All-Purpose Compute $0.55/DBU, Jobs Compute $0.15/DBU (AWS Premium); "Using All-Purpose Compute for jobs that belong on Jobs Compute can cost 2 to 3 times more for the same workload." flexera.com/blog/finops/databricks-pricing-guide
- CloudZero, Databricks pricing guide — "All-Purpose Compute clusters … can cost 2–3X more per DBU than Jobs Compute clusters used for automated pipelines." cloudzero.com/blog/databricks-pricing
- Databricks, Best Practices for Cost Management (2022) — customers "saved tens of thousands of dollars by simply moving just ten percent of their workloads from all-purpose clusters to jobs clusters"; Photon delivers a 3–8× performance gain; spot instances give up to 90% off VM compute. databricks.com/blog/best-practices-cost-management-databricks
- Photon ~2× DBU premium on classic compute — "Databricks charges approximately 2× DBUs for Photon … the breakeven point is roughly a 2× speed improvement," so it only saves money when it makes the job at least 2× faster. B EYE — Photon guide · Databricks Community
- Nucleus Research, "Databricks ROI Case Study: Texas Rangers" (Mar 2024) — 375% ROI, 6-month payback, 4× cost-effectiveness vs the prior cloud data warehouse, 61% data-team productivity gain. nucleusresearch.com
- Q. Liu & Z. Yu, ACM Symposium on Cloud Computing, 2018 — "The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload": datacenter resource utilization is "very low, which wastes a huge amount of infrastructure investment and energy." doi.org/10.1145/3267809.3267830
- I. Matthew, IEEE ACDSA, 2026 — "Enhancing Cloud Sustainability by Optimizing Cloud Computing Through Right-Sizing and Autoscaling": right-sizing plus threshold-based autoscaling measurably reduce idle and overprovisioned resource use. doi.org/10.1109/ACDSA67686.2026.11467824