Manage Your FLOP Budget
Sourced from whest-starterkit @
aaa3882.
Manage Your FLOP Budget
π― When to use this page
Use this page to understand how FLOP budgets work and how to optimize your estimator to stay within budget.
Why FLOPs, not wall-clock time
This challenge scores estimators by analytical FLOP count, not execution time. Every mathematical operation your estimator performs is tracked by flopscope β a NumPy-compatible library that counts floating-point operations deterministically from tensor shapes.
This means your score is hardware-independent: the same estimator produces the same FLOP count on a laptop and a GPU cluster. You can focus on algorithmic efficiency rather than hardware tuning.
For the full flopscope API and cost model, see the flopscope documentation.
Which operations cost FLOPs
| Category | Examples | Cost |
|---|---|---|
| Free (0 FLOPs) | fnp.array(), fnp.zeros(), fnp.ones(), fnp.reshape(), fnp.transpose(), indexing, fnp.concatenate(), fnp.stack() | No budget impact |
| Pointwise (1 FLOP/element) | fnp.add(), fnp.multiply(), fnp.exp(), fnp.sqrt(), fnp.maximum() | Output element count |
| Reductions | fnp.sum(), fnp.mean(), fnp.max() | Input element count |
| Matrix operations | fnp.matmul(), fnp.einsum() | Depends on dimensions β typically dominates your budget |
| Random samplers | rng.standard_normal(), rng.uniform() (where rng = fnp.random.default_rng(seed)); same for module-level fnp.random.standard_normal() etc. and fnp.random.RandomState(seed) | Calibrated per method (default ~16 FLOPs/element for standard_normal) |
Key insight: fnp.matmul on (n, n) matrices costs O(n^3) FLOPs. For width-256 networks, a single matmul costs ~33M FLOPs. Most of your budget goes to matrix operations.
Check your budget usage
Wrap your estimator logic in a BudgetContext to see how many FLOPs it consumes:
import flopscope as flops
with flops.BudgetContext(flop_budget=68_000_000_000) as budget:
result = estimator.predict(mlp, budget=68_000_000_000)
print(f"FLOPs used: {budget.flops_used:,}")
print(f"FLOPs remaining: {budget.flops_remaining:,}")If you also want a wall-clock guardrail while debugging locally, set
wall_time_limit_s on the same BudgetContext:
with flops.BudgetContext(
flop_budget=68_000_000_000,
wall_time_limit_s=2.0,
) as budget:
result = estimator.predict(mlp, budget=68_000_000_000)Get a per-operation breakdown
Use budget.summary() for the current explicit context or
flops.budget_summary() for the session/global view to see which operations
consume the most FLOPs:
import flopscope as flops
with flops.BudgetContext(flop_budget=68_000_000_000) as budget:
result = estimator.predict(mlp, budget=68_000_000_000)
print(budget.summary())
flops.budget_summary()This prints a table showing each operation's name, call count, and cumulative FLOP cost β letting you identify the expensive operations to optimize.
The same summaries also show timing data:
wall_time_s: total elapsed time for the contextflopscope_backend_time_s: time spent inside counted flopscope backend callsflopscope_overhead_time_s: time spent inside flopscope dispatch and bookkeepingresidual_wall_time_s: time spent outside flopscope backend and dispatch work
In whest run, the CLI flags map to these concepts as follows:
--wall-time-limit: forwards a wall-clock limit into the estimator'sBudgetContext--residual-wall-time-limit: adds a WhestBench scoring check on the reportedresidual_wall_time_s
Interpret whest run output
When you run your estimator with whest run, the per-MLP report includes:
flops_used: total FLOPs your estimator consumed for that MLP.budget_exhausted:trueif your estimator exceeded the FLOP budget β predictions were zeroed.final_layer_mse/all_layers_mse: your prediction accuracy (lower is better).
If budget_exhausted is true, your predictions were discarded. You need to reduce FLOP usage.
Worked walkthrough: mean propagation, line by line
The table below profiles examples/02_mean_propagation.py on the default Stage 1 MLP (width=256, depth=8 β same shape as the Stage-3 grader). Numbers are aggregated across all 8 layers; per-layer cost is roughly the row total divided by 8. Reproduce with ctx.summary() inside a flopscope.BudgetContext after a single predict() call (profiled under flopscope 0.5.0).
Operation in predict() | Calls | FLOPs (total) | % of predict() total |
|---|---|---|---|
mu_pre = w.T @ mu and var_pre = (w*w).T @ var (matmul) | 16 | 2,093,056 | 77.1% |
mu_pre * Phi_alpha + sigma_pre * phi_alpha etc. (multiply) | 64 | 538,624 | 19.8% |
flops.stats.norm.pdf(alpha) | 8 | 32,768 | 1.2% |
flops.stats.norm.cdf(alpha) | 8 | 32,768 | 1.2% |
mu_pre * Phi_alpha + ... etc. (add) | 24 | 6,144 | 0.2% |
fnp.maximum(var_pre, 1e-12) (maximum) | 16 | 4,096 | 0.2% |
fnp.sqrt(var_pre) | 8 | 2,048 | 0.1% |
mu_pre / sigma_pre (true_divide) | 8 | 2,048 | 0.1% |
ez2 - mu*mu (subtract) | 8 | 2,048 | 0.1% |
fnp.stack(rows, axis=0) | 1 | 2,048 | 0.1% |
Total per predict() | β | 2,715,648 | β |
The full ~2.7 M FLOPs spends only ~0.004% of the default 6.8e10 grader budget, so mean propagation lands well below the multiplier floor at this shape β see Scoring Model.
Two takeaways:
matmuldominates. ~77% ofpredict()cost is the two matmuls per layer (the pointwise ReLU-moment terms βmultiplyβ are the visible ~20% remainder). Halving the matmul count (e.g., switching to a diagonal-only formulation, or fusing into a singleeinsumlikeexamples/03_covariance_propagation.pydoes for the symmetric cov-update) buys you most of that back.- Reductions, sqrt, and divides are free in practice. Don't twist your code to avoid them; the cost is in the tens of FLOPs per layer.
The same pattern holds for examples/03_covariance_propagation.py, where the O(widthΒ³) symmetry-aware einsum lands at ~404 M FLOPs per predict() (~0.6% of the grader budget) β ~150Γ more expensive than mean propagation (its full covariance is genuinely heavier than mean propagation's diagonal variance), but still leaving plenty of headroom.
Optimization tips
-
Matmul dominates. Each
fnp.matmul(W.T, mu)on a(width, width)matrix costsO(width^2)FLOPs per layer. Reducing the number of matmuls (or their dimensions) has the biggest impact. -
Diagonal approximations save FLOPs. Mean propagation uses diagonal variance (
O(width^2)per layer) instead of full covariance propagation (O(width^3)per layer). Choose the right level of approximation for your budget. -
Array creation is free.
fnp.array(),fnp.zeros(),fnp.ones(),fnp.eye()cost 0 FLOPs. Precompute and store intermediate values freely. -
Pick one strategy per estimator. Use either mean propagation or full covariance as your default implementation, then optimize it for the fixed budget.