whestbench.
Participant GuideGetting started

Stage 4: Subprocess Runner

Sourced from whest-starterkit @ aaa3882.

Stage 4: Subprocess Runner

← Tutorial

Ladder: 1 · 2 · 3 · 4 · 5

Stage 3 runs in your interpreter. Stage 4 spawns each estimator call in a fresh subprocess — the same isolation the grader uses. Catches:

  • Shared global state between calls
  • Stale RNG seeds from previous calls
  • Memory leaks
  • Imports that fail in a clean process

🚀 Run it

uv run whest run --estimator estimator.py --dataset hf://aicrowd/arc-whestbench-public-2026 --split mini --runner subprocess

Same score format as Stage 3. If your score drops noticeably, you've found a bug masked by in-process state.

✅ Expected outcome

Your Stage 4 adjusted_final_layer_score should match Stage 3 exactly — the Mini split fixes the MLPs and bakes the ground truth at N=1e9, so there is no Monte-Carlo noise between the two runs. If Stage 4 differs, you've found a bug masked by in-process state.

If Stage 4 is worse than Stage 3, the most likely culprits are:

  1. Module-level mutable statesetup() populated a global that persists between MLPs in-process but resets in subprocess workers.
  2. Caches keyed on object identityid() collisions in-process accidentally hit cached results; subprocess invalidates that.
  3. RNG seeded once at import time — survives between in-process calls; subprocess re-seeds on every call.

Move state into the Estimator instance (or stash it on the SetupContext.scratch_dir) and re-run.

✅ When you're ready

Move on to Stage 5: Package your submission.

On this page