Debugging Checklist
Sourced from whest-starterkit @
aaa3882.
Debugging Checklist
Use this page when your estimator runs but the score is bad, or something feels wrong. Work through the tiers in order.
Tier 0: Pure-Python inner loop (fastest iteration)
For fast, no-framework iteration — e.g. to print intermediate activations, attach pdb, or sweep Monte Carlo sample counts — run your estimator as a plain Python script instead of going through whest run. The repo-root estimator.py is exactly this kind of self-contained loop: it constructs an MLP via local_engine.build_mlp, invokes the inline Estimator, and prints a FLOPs-vs-MSE convergence table. It's runnable two ways:
# 1) Direct: no CLI, no runner, no subprocess — just Python.
uv run python estimator.py
# 1b) Same file, with a side-by-side baseline comparison:
uv run python estimator.py --baseline mean_propagation
# 2) Scored via whestbench (same file, same class — honors BaseEstimator):
uv run whest run --estimator estimator.pyEdit predict() in estimator.py and re-run. See Stage 1 for the full walkthrough.
Tier 1: Sanity checks (2 minutes)
Run validation:
whest validate --estimator estimator.pyIf it fails, check:
- Output shape: does
predict()return shape(mlp.depth, mlp.width)? - Finite values: are all values finite? Check for
nanorinfin your math. - Class name: is your class named
Estimator? The loader looks for this by default.
Tier 2: Correctness checks (5 minutes)
Run your estimator and look at the report:
whest run --estimator estimator.py --n-mlps 3 --runner local --debugCheck:
- Did
predict()raise? Ifwhest runexits with status1and prints an "Estimator Errors" panel, your estimator raised an exception. Use--debugto include tracebacks inline in the panel, or add--fail-fastto halt at the first failure and let the raw Python traceback propagate. - Does zeros beat you? If returning
fnp.zeros((mlp.depth, mlp.width))scores better than your estimator, your predictions are wrong in a way that's worse than guessing zero. - Is
budget_exhaustedtrue? If so, your estimator exceeded the FLOP budget and all predictions were zeroed. See Manage Your FLOP Budget. - Are errors concentrated at deep layers? Run with
--debugand compareall_layers_mse— if early layers are good but later layers are bad, your propagation may accumulate errors.
Tier 3: Optimization checks (10+ minutes)
Profile your FLOP usage:
import flopscope as flops
with flops.BudgetContext(flop_budget=68_000_000_000) as budget:
result = estimator.predict(mlp, budget=68_000_000_000)
flops.budget_summary()Check:
- Is matmul dominant? If >90% of FLOPs are in matmul, consider diagonal variance instead of full covariance.
- Redundant computation? Are you computing something in a loop that could be precomputed once?
- Free operations wasted? Remember:
fnp.zeros,fnp.transpose,fnp.reshape, indexing cost 0 FLOPs.
Using pdb / breakpoint() inside your estimator
The interactive progress display can mask the debugger prompt when you drop a breakpoint inside predict(). Use one of the following patterns:
-
Recommended — use
breakpoint()rather thanpdb.set_trace(). The CLI installs a hook that pauses the live display before the debugger starts, so the prompt appears cleanly:def predict(self, mlp, budget): breakpoint() ... -
With
pdb.set_trace()— pass--format plainto disable the live display entirely:whest run --estimator estimator.py --runner local --format plain -
Or set the standard env var before running:
PYTHONBREAKPOINT=pdb.set_trace whest run --estimator ./... --runner localThe CLI auto-detects this and switches to plain output automatically.
Debugging is best supported with
--runner local.--runner local(or--runner inprocess) runs in-process for direct traces and interactive debugging. The isolation runners (--runner subprocess, legacy--runner server) communicate via worker protocol I/O, so interactive debuggers should be used in local mode.