CLI
whest run
Run local evaluation for an estimator.
whest run
Run local evaluation for an estimator.
whest run [options]| Option | Default | Description |
|---|---|---|
--estimator | Path to estimator.py (see https://github.com/AIcrowd/whest-starterkit for starter files). | |
--class | Estimator class name to load from the estimator file (auto-detected if omitted). | |
--runner | 'local' | Execution backend: 'local'/'inprocess' run in-process; 'subprocess'/'server' run in an isolated subprocess (default: local). |
--n-mlps | Number of MLPs to evaluate. Default: 10 when --dataset is not provided; otherwise the full dataset size. Clamped to the dataset size when --dataset is set and --n-mlps exceeds it. | |
--detail | 'raw' | Report verbosity: 'raw' for a concise summary or 'full' for expanded per-MLP detail (default: raw). |
--profile | Collect and display per-MLP FLOP/budget profiling breakdowns in the report. | |
--show-diagnostic-plots | Include diagnostic plot panes in the rendered (non-JSON) report. | |
--format | Select output format: rich, plain, or json. | |
--json | Alias for --format json. | |
--dataset | Path to a baked dataset directory, or hf://owner/repo[@revision] for HF Hub. | |
--streaming | Stream the dataset from HF instead of downloading it. Iteration-only (no random access). Data is NOT cached — subsequent runs will re-fetch. Useful for small --n-mlps debugging runs. See docs/guides/datasets.md#streaming-mode. | |
--revision | HF Hub revision (tag or commit SHA) for --dataset. | |
--split | For multi-split datasets, the split to evaluate. Required when the dataset is multi-split; optional when single-split (defaults to the only split). | |
--flop-budget | Effective compute budget per MLP in FLOPs. Caps C_m = F_m + lambda*R_m (analytical FLOPs plus charged residual wall time). Always honored; any flop_budget stored in --dataset's metadata is ignored. Default: 68_000_000_000 (6.8e10). | |
--lambda-flops-per-second | Residual wall-time penalty rate lambda in C_m = F_m + lambda*R_m (FLOP-equivalents per second of residual wall time). Default: 1e11. | |
--n-samples | Ground truth samples per MLP (default: widthwidth256). Lower values speed up generation at the cost of noisier scores. | |
--debug | Show full Python tracebacks for errors instead of condensed messages. | |
--fail-fast | Stop on the first estimator error and let the raw Python traceback propagate (combine with --debug to show it). | |
--wall-time-limit | 60.0 | Wall-clock time limit per predict call (default: 60.0 seconds). |
--residual-wall-time-limit | Time limit for non-flopscope operations per predict call (default: unlimited). | |
--seed | Random seed for the run. Without --dataset, seeds both MLP generation and estimator setup. With --dataset, MLP seeds come from the dataset; this flag seeds estimator setup only. Default: omitted (ctx.seed defaults to 0; run_config.seed is null in the JSON output). | |
--max-threads | Limit BLAS to at most N CPU threads. |