whestbench.

Write an Estimator

Sourced from whest-starterkit @ aaa3882.

Write an Estimator

← Documentation

🎯 When to use this page

Use this page when implementing your custom participant estimator.

🚀 Do this now

Start from examples/01_random.py, then replace the prediction logic.

Minimal structure:

from __future__ import annotations

import flopscope.numpy as fnp

from whestbench import BaseEstimator, MLP


class Estimator(BaseEstimator):
    def predict(self, mlp: MLP, budget: int) -> fnp.ndarray:
        return fnp.zeros((mlp.depth, mlp.width))

✅ Expected outcome

Your estimator implements predict(mlp, budget) and returns a (depth, width) array of predicted neuron values.

MLP traversal starter

If you need exact MLP field semantics or weight matrices, use:

Contract checklist

  • return a (mlp.depth, mlp.width) array,
  • all values must be finite.

⚠️ Common first failure

Symptom: estimator returns wrong shape.

Fix: ensure predict returns a 2D array with shape (mlp.depth, mlp.width) and all finite values.


Building your first estimator

Step 1: The zeros baseline

The template estimator returns all zeros. Run it to see what a bad score looks like:

uv run whest run --estimator estimator.py --n-mlps 3

Look at final_layer_mse — the raw accuracy of predicting all zeros — and adjusted_final_layer_score, the budget-scaled metric the leaderboard ranks on. The zeros baseline is your accuracy floor.

Step 2: Mean propagation

Copy the mean propagation example — it uses the ReLU expectation formula:

cp examples/02_mean_propagation.py estimator.py
uv run whest run --estimator estimator.py --n-mlps 3

Compare adjusted_final_layer_score (and the underlying final_layer_mse) to the zeros baseline. Mean propagation uses the network's weights to make informed predictions, so it should score significantly better.

Step 3: Understand the score report

The report shows per-MLP results:

  • final_layer_mse: your raw accuracy on the final layer (the diagnostic that drives your score)
  • adjusted_final_layer_score: final_layer_mse scaled by your compute use — the leaderboard ranking metric
  • flops_used: how many FLOPs your estimator consumed
  • budget_exhausted: whether you exceeded the budget (predictions zeroed if true)
  1. examples/01_random.py — the interface
  2. examples/02_mean_propagation.py — simplest real algorithm
  3. examples/03_covariance_propagation.py — more accurate, more expensive
  4. estimator.py — the repo-root template, runnable two ways: uv run python estimator.py for the pure-local pedagogical loop (see Stage 1) and uv run whest run --estimator estimator.py for the harness path. Copy when you want a minimal iteration loop.
  5. Algorithm Ideas — full survey of strategies
  6. Performance Tips — FLOP optimization patterns

➡️ Next step

On this page