Budget Planning & Debugging
Estimate costs before running and diagnose overruns after.
You will learn:
- How to use cost query functions to estimate FLOPs without executing
- How to read and interpret the budget summary
- How to diagnose expensive operations using the operation log
- Optimization strategies for reducing FLOP consumption
Estimate before running
Flopscope provides cost query functions that compute FLOP costs from shapes without executing anything or touching the budget. Use these to plan before committing FLOPs:
import flopscope as flops
import flopscope.numpy as fnp
# Einsum cost
cost = flops.einsum_cost('ij,jk->ik', shapes=[(256, 256), (256, 256)])
print(f"Matmul cost: {cost:,}") # 16,777,216 (256^3, FMA=1)
# SVD cost
cost = flops.svd_cost(m=256, n=256, k=10)
print(f"SVD cost: {cost:,}") # 655,360
# Pointwise cost (unary/binary ops like exp, add, multiply)
cost = flops.pointwise_cost("exp", shape=(256, 256))
print(f"Pointwise cost: {cost:,}") # 65,536
# Reduction cost (sum, mean, max, etc.)
cost = flops.reduction_cost("sum", input_shape=(256, 256))
print(f"Reduction cost: {cost:,}") # 65,536For multi-operand einsums (3+ operands), use fnp.einsum_path() to see the step-by-step contraction breakdown with per-step costs and symmetry savings:
path, info = fnp.einsum_path('ijk,ai,bj,ck->abc', T, A, B, C)
print(f"Optimized cost: {info.optimized_cost:,}")
print(f"Naive cost: {info.naive_cost:,}")
print(f"Speedup: {info.speedup:.1f}x")
print(info) # full per-step tablefnp.einsum_path() does not execute the contraction, but it does record a nominal 1-FLOP planning event so the path query itself is still visible in the operation log.
Budget breakdown example
Plan a multi-step computation before executing:
steps = [
("einsum ij,j->i", flops.einsum_cost('ij,j->i', shapes=[(256, 256), (256,)])),
("ReLU (maximum)", flops.pointwise_cost("maximum", shape=(256,))),
("sum reduction", flops.reduction_cost("sum", input_shape=(256,))),
]
total = sum(cost for _, cost in steps)
print(f"{'Operation':<20} {'FLOPs':>12}")
print("-" * 34)
for name, cost in steps:
print(f"{name:<20} {cost:>12,}")
print("-" * 34)
print(f"{'Total':<20} {total:>12,}")Read the budget summary
Call flops.budget_summary() after your computation for a human-readable breakdown, or budget.summary() inside a context. Pass by_namespace=True when you want dotted namespace attribution:
with flops.BudgetContext(flop_budget=10_000_000) as budget:
A = fnp.ones((256, 256))
x = fnp.ones((256,))
h = fnp.einsum('ij,j->i', A, x)
h = fnp.exp(h)
h = fnp.sum(h)
print(budget.summary())The summary shows cost per operation type, sorted by highest cost first. Look for operations consuming a disproportionate share of the budget. When you opt into by_namespace=True, the display adds a namespace breakdown for the exact dotted paths recorded in that run.
For programmatic analysis, use flops.budget_summary_dict():
data = flops.budget_summary_dict()
print(f"Budget: {data['flop_budget']:,}")
print(f"Used: {data['flops_used']:,}")
print(f"Left: {data['flops_remaining']:,}")
for op_name, op_data in data["operations"].items():
print(f" {op_name}: {op_data['flop_cost']:,} ({op_data['calls']} calls)")Use flops.budget_summary_dict(by_namespace=True) for exact per-namespace breakdowns keyed by the full dotted path:
with flops.BudgetContext(flop_budget=1000, namespace="predict") as budget:
x = fnp.ones((1,))
with fnp.namespace("fallback"):
with fnp.namespace("sampling"):
sample = fnp.add(x, 1)
data = budget.summary_dict(by_namespace=True)
print(data["by_namespace"]["predict.fallback.sampling"]["flops_used"])Add a time limit when FLOPs are not the only risk
Some operations are analytically cheap enough to fit the FLOP budget but still
slow in practice. Use wall_time_limit_s on the same BudgetContext when you
want a cooperative wall-clock deadline in addition to the FLOP cap:
with flops.BudgetContext(
flop_budget=10_000_000,
wall_time_limit_s=2.0,
namespace="predict",
) as budget:
# computation must stay within both limits
...
print(budget.summary())When the time limit is exceeded, Flopscope raises TimeExhaustedError at the next
operation boundary. The summary exposes four timing views that decompose wall time exactly:
wall_time_s: total elapsed time for the contextflopscope_backend_time_s: time spent inside the underlying NumPy / BLAS / LAPACK backend calls being countedflopscope_overhead_time_s: time spent in flopscope's own dispatch code (wrapper preambles, FLOP cost computation, view-casts, post-call wrapping,maybe_check_nan_infwhen opted in viaflopscope.configure(check_nan_inf=True))residual_wall_time_s: the measured wall-clock remainder outside backend calls and flopscope overhead (user Python between ops,time.sleep, GC pauses, un-instrumented numpy)
The identity wall_time_s = flopscope_backend_time_s + flopscope_overhead_time_s + residual_wall_time_s holds within numerical tolerance.
Use budget.summary() when you want the current context's timings, and
flops.budget_summary() when you want the accumulated session/global view.
Diagnose overruns
When you hit a BudgetExhaustedError, the budget's operation log gives per-call detail:
for record in budget.op_log:
print(f"{record.op_name:<16} cost={record.flop_cost:>12,} cumulative={record.cumulative:>12,}")Each OpRecord contains:
| Field | Description |
|---|---|
op_name | Operation name (e.g., "einsum", "exp") |
namespace | Effective namespace path recorded for that operation |
subscripts | Einsum subscript string, or None |
shapes | Tuple of input shapes |
flop_cost | FLOP cost of this single call |
cumulative | Running total after this call |
flopscope_context_start_offset_s | Seconds from the active BudgetContext start to when this operation was recorded |
flopscope_backend_duration_s | Seconds spent in the underlying backend call for this operation |
flopscope_overhead_duration_s | Seconds of flopscope wrapper/accounting overhead attributed to this operation |
Look for the operation where cumulative jumps sharply -- that is your most expensive call.
For real-time monitoring during long computations, use the live budget display:
with flops.budget_live():
with flops.BudgetContext(flop_budget=10**8, namespace="training") as budget:
for i in range(100):
# ... computation ...
pass
# The live display updates automatically as FLOPs are consumedWhat to do next
Once you have identified the expensive operations, apply these strategies:
-
Reduce dimensions. If
random.randn(1024, 1024)is too expensive, try smaller arrays. A 512x512 matrix costs 1/4 the FLOPs of a 1024x1024 matrix for a matmul. -
Exploit symmetry. If operands are symmetric, use
flops.as_symmetric()to halve pointwise costs and significantly reduce einsum costs. See Symmetry Savings. -
Use cheaper operations. A matrix-vector product costs
m*nFLOPs, while a matrix-matrix product costsm*n*k. Avoid computing full matrix products when you only need a few rows or a single vector result. -
Increase budget. If the computation is genuinely needed and you have headroom, raise
flop_budgeton theBudgetContext. -
Split into phases. Use namespaces to attribute different phases without splitting the FLOP budget into child budgets:
with flops.BudgetContext(flop_budget=10**8, namespace="solver") as budget:
with fnp.namespace("init"):
# initialization
...
with fnp.namespace("solve"):
# main computation
...
print(budget.summary(by_namespace=True))