Debugging Checklist

Sourced from whest-starterkit @ aaa3882.

Debugging Checklist

← Documentation

Use this page when your estimator runs but the score is bad, or something feels wrong. Work through the tiers in order.

Tier 0: Pure-Python inner loop (fastest iteration)

For fast, no-framework iteration — e.g. to print intermediate activations, attach pdb, or sweep Monte Carlo sample counts — run your estimator as a plain Python script instead of going through whest run. The repo-root estimator.py is exactly this kind of self-contained loop: it constructs an MLP via local_engine.build_mlp, invokes the inline Estimator, and prints a FLOPs-vs-MSE convergence table. It's runnable two ways:

# 1) Direct: no CLI, no runner, no subprocess — just Python.
uv run python estimator.py

# 1b) Same file, with a side-by-side baseline comparison:
uv run python estimator.py --baseline mean_propagation

# 2) Scored via whestbench (same file, same class — honors BaseEstimator):
uv run whest run --estimator estimator.py

Edit predict() in estimator.py and re-run. See Stage 1 for the full walkthrough.

Tier 1: Sanity checks (2 minutes)

Run validation:

whest validate --estimator estimator.py

If it fails, check:

Output shape: does predict() return shape (mlp.depth, mlp.width)?
Finite values: are all values finite? Check for nan or inf in your math.
Class name: is your class named Estimator? The loader looks for this by default.

Tier 2: Correctness checks (5 minutes)

Run your estimator and look at the report:

whest run --estimator estimator.py --n-mlps 3 --runner local --debug

Check:

Did predict() raise? If whest run exits with status 1 and prints an "Estimator Errors" panel, your estimator raised an exception. Use --debug to include tracebacks inline in the panel, or add --fail-fast to halt at the first failure and let the raw Python traceback propagate.
Does zeros beat you? If returning fnp.zeros((mlp.depth, mlp.width)) scores better than your estimator, your predictions are wrong in a way that's worse than guessing zero.
Is budget_exhausted true? If so, your estimator exceeded the FLOP budget and all predictions were zeroed. See Manage Your FLOP Budget.
Are errors concentrated at deep layers? Run with --debug and compare all_layers_mse — if early layers are good but later layers are bad, your propagation may accumulate errors.

Tier 3: Optimization checks (10+ minutes)

Profile your FLOP usage:

import flopscope as flops

with flops.BudgetContext(flop_budget=68_000_000_000) as budget:
    result = estimator.predict(mlp, budget=68_000_000_000)
    flops.budget_summary()

Check:

Is matmul dominant? If >90% of FLOPs are in matmul, consider diagonal variance instead of full covariance.
Redundant computation? Are you computing something in a loop that could be precomputed once?
Free operations wasted? Remember: fnp.zeros, fnp.transpose, fnp.reshape, indexing cost 0 FLOPs.

Using `pdb` / `breakpoint()` inside your estimator

The interactive progress display can mask the debugger prompt when you drop a breakpoint inside predict(). Use one of the following patterns:

Recommended — use breakpoint() rather than pdb.set_trace(). The CLI installs a hook that pauses the live display before the debugger starts, so the prompt appears cleanly:
```
def predict(self, mlp, budget):
    breakpoint()
    ...
```
With pdb.set_trace() — pass --format plain to disable the live display entirely:
```
whest run --estimator estimator.py --runner local --format plain
```
Or set the standard env var before running:
```
PYTHONBREAKPOINT=pdb.set_trace whest run --estimator ./... --runner local
```
The CLI auto-detects this and switches to plain output automatically.

Debugging is best supported with --runner local. --runner local (or --runner inprocess) runs in-process for direct traces and interactive debugging. The isolation runners (--runner subprocess, legacy --runner server) communicate via worker protocol I/O, so interactive debuggers should be used in local mode.

➡️ Next step

Algorithm Ideas

Inspect and Traverse MLP Structure

Debugging Checklist

Debugging Checklist

Tier 0: Pure-Python inner loop (fastest iteration)

Tier 1: Sanity checks (2 minutes)

Tier 2: Correctness checks (5 minutes)

Tier 3: Optimization checks (10+ minutes)

Using pdb / breakpoint() inside your estimator

➡️ Next step

On this page

Using `pdb` / `breakpoint()` inside your estimator