flopscope.
Guides

Budget Planning & Debugging

Estimate costs before running and diagnose overruns after.

You will learn:

  • How to use cost query functions to estimate FLOPs without executing
  • How to read and interpret the budget summary
  • How to diagnose expensive operations using the operation log
  • Optimization strategies for reducing FLOP consumption

Estimate before running

Flopscope provides cost query functions that compute FLOP costs from shapes without executing anything or touching the budget. Use these to plan before committing FLOPs:

import flopscope as flops
import flopscope.numpy as fnp

# Einsum cost
cost = flops.einsum_cost('ij,jk->ik', shapes=[(256, 256), (256, 256)])
print(f"Matmul cost: {cost:,}")         # 16,777,216 (256^3, FMA=1)

# SVD cost
cost = flops.svd_cost(m=256, n=256, k=10)
print(f"SVD cost: {cost:,}")            # 655,360

# Pointwise cost (unary/binary ops like exp, add, multiply)
cost = flops.pointwise_cost("exp", shape=(256, 256))
print(f"Pointwise cost: {cost:,}")      # 65,536

# Reduction cost (sum, mean, max, etc.)
cost = flops.reduction_cost("sum", input_shape=(256, 256))
print(f"Reduction cost: {cost:,}")      # 65,536

For multi-operand einsums (3+ operands), use fnp.einsum_path() to see the step-by-step contraction breakdown with per-step costs and symmetry savings:

path, info = fnp.einsum_path('ijk,ai,bj,ck->abc', T, A, B, C)
print(f"Optimized cost: {info.optimized_cost:,}")
print(f"Naive cost:     {info.naive_cost:,}")
print(f"Speedup:        {info.speedup:.1f}x")
print(info)  # full per-step table

fnp.einsum_path() does not execute the contraction, but it does record a nominal 1-FLOP planning event so the path query itself is still visible in the operation log.

Budget breakdown example

Plan a multi-step computation before executing:

steps = [
    ("einsum ij,j->i", flops.einsum_cost('ij,j->i', shapes=[(256, 256), (256,)])),
    ("ReLU (maximum)", flops.pointwise_cost("maximum", shape=(256,))),
    ("sum reduction", flops.reduction_cost("sum", input_shape=(256,))),
]

total = sum(cost for _, cost in steps)
print(f"{'Operation':<20} {'FLOPs':>12}")
print("-" * 34)
for name, cost in steps:
    print(f"{name:<20} {cost:>12,}")
print("-" * 34)
print(f"{'Total':<20} {total:>12,}")

Read the budget summary

Call flops.budget_summary() after your computation for a human-readable breakdown, or budget.summary() inside a context. Pass by_namespace=True when you want dotted namespace attribution:

with flops.BudgetContext(flop_budget=10_000_000) as budget:
    A = fnp.ones((256, 256))
    x = fnp.ones((256,))
    h = fnp.einsum('ij,j->i', A, x)
    h = fnp.exp(h)
    h = fnp.sum(h)
    print(budget.summary())

The summary shows cost per operation type, sorted by highest cost first. Look for operations consuming a disproportionate share of the budget. When you opt into by_namespace=True, the display adds a namespace breakdown for the exact dotted paths recorded in that run.

For programmatic analysis, use flops.budget_summary_dict():

data = flops.budget_summary_dict()
print(f"Budget: {data['flop_budget']:,}")
print(f"Used:   {data['flops_used']:,}")
print(f"Left:   {data['flops_remaining']:,}")
for op_name, op_data in data["operations"].items():
    print(f"  {op_name}: {op_data['flop_cost']:,} ({op_data['calls']} calls)")

Use flops.budget_summary_dict(by_namespace=True) for exact per-namespace breakdowns keyed by the full dotted path:

with flops.BudgetContext(flop_budget=1000, namespace="predict") as budget:
    x = fnp.ones((1,))
    with fnp.namespace("fallback"):
        with fnp.namespace("sampling"):
            sample = fnp.add(x, 1)

data = budget.summary_dict(by_namespace=True)
print(data["by_namespace"]["predict.fallback.sampling"]["flops_used"])

Add a time limit when FLOPs are not the only risk

Some operations are analytically cheap enough to fit the FLOP budget but still slow in practice. Use wall_time_limit_s on the same BudgetContext when you want a cooperative wall-clock deadline in addition to the FLOP cap:

with flops.BudgetContext(
    flop_budget=10_000_000,
    wall_time_limit_s=2.0,
    namespace="predict",
) as budget:
    # computation must stay within both limits
    ...

print(budget.summary())

When the time limit is exceeded, Flopscope raises TimeExhaustedError at the next operation boundary. The summary exposes four timing views that decompose wall time exactly:

  • wall_time_s: total elapsed time for the context
  • flopscope_backend_time_s: time spent inside the underlying NumPy / BLAS / LAPACK backend calls being counted
  • flopscope_overhead_time_s: time spent in flopscope's own dispatch code (wrapper preambles, FLOP cost computation, view-casts, post-call wrapping, maybe_check_nan_inf when opted in via flopscope.configure(check_nan_inf=True))
  • residual_wall_time_s: the measured wall-clock remainder outside backend calls and flopscope overhead (user Python between ops, time.sleep, GC pauses, un-instrumented numpy)

The identity wall_time_s = flopscope_backend_time_s + flopscope_overhead_time_s + residual_wall_time_s holds within numerical tolerance.

Use budget.summary() when you want the current context's timings, and flops.budget_summary() when you want the accumulated session/global view.

Diagnose overruns

When you hit a BudgetExhaustedError, the budget's operation log gives per-call detail:

for record in budget.op_log:
    print(f"{record.op_name:<16} cost={record.flop_cost:>12,}  cumulative={record.cumulative:>12,}")

Each OpRecord contains:

FieldDescription
op_nameOperation name (e.g., "einsum", "exp")
namespaceEffective namespace path recorded for that operation
subscriptsEinsum subscript string, or None
shapesTuple of input shapes
flop_costFLOP cost of this single call
cumulativeRunning total after this call
flopscope_context_start_offset_sSeconds from the active BudgetContext start to when this operation was recorded
flopscope_backend_duration_sSeconds spent in the underlying backend call for this operation
flopscope_overhead_duration_sSeconds of flopscope wrapper/accounting overhead attributed to this operation

Look for the operation where cumulative jumps sharply -- that is your most expensive call.

For real-time monitoring during long computations, use the live budget display:

with flops.budget_live():
    with flops.BudgetContext(flop_budget=10**8, namespace="training") as budget:
        for i in range(100):
            # ... computation ...
            pass
        # The live display updates automatically as FLOPs are consumed

What to do next

Once you have identified the expensive operations, apply these strategies:

  1. Reduce dimensions. If random.randn(1024, 1024) is too expensive, try smaller arrays. A 512x512 matrix costs 1/4 the FLOPs of a 1024x1024 matrix for a matmul.

  2. Exploit symmetry. If operands are symmetric, use flops.as_symmetric() to halve pointwise costs and significantly reduce einsum costs. See Symmetry Savings.

  3. Use cheaper operations. A matrix-vector product costs m*n FLOPs, while a matrix-matrix product costs m*n*k. Avoid computing full matrix products when you only need a few rows or a single vector result.

  4. Increase budget. If the computation is genuinely needed and you have headroom, raise flop_budget on the BudgetContext.

  5. Split into phases. Use namespaces to attribute different phases without splitting the FLOP budget into child budgets:

with flops.BudgetContext(flop_budget=10**8, namespace="solver") as budget:
    with fnp.namespace("init"):
        # initialization
        ...

    with fnp.namespace("solve"):
        # main computation
        ...

print(budget.summary(by_namespace=True))

On this page