Stage 5: Package Your Submission
Sourced from whest-starterkit @
aaa3882.
Stage 5: Package Your Submission
You've climbed the ladder. Now ship it.
Before you click "submit", run through the Pre-Submission Checklist — it's one screen, all commands, and catches the bugs the grader will hit.
🚀 Run it
uv run whest package --estimator estimator.py --output submission.tar.gzThis produces submission.tar.gz containing your estimator.py, the resolved whestbench version, and any imports your estimator needs (auto-detected).
📤 Submit to AIcrowd
Ship it straight from the CLI — no manual portal upload needed.
First, log in once with your AIcrowd API key (grab it from your AIcrowd profile):
uv run whest loginThen submit. whest submit packages estimator.py and uploads it to the
challenge in one step (you can also submit a prebuilt tarball):
# package + submit in one go
uv run whest submit --estimator estimator.py
# or submit a tarball you already built
uv run whest submit submission.tar.gzAdd --watch to follow the submission until it's graded:
uv run whest submit --estimator estimator.py --watchPrefer the browser? The packaged submission.tar.gz still uploads fine on
the AIcrowd challenge submission page.
What's in the artifact
estimator.py— verbatim copy of yoursmanifest.json— entrypoint, whestbench/flopscope/numpy versions, Python version, per-file SHA-256, and package timestamprequirements.txt— only when your estimator pulls in extra packages (frozen from youruv.lock)
After submission
What happens once whest submit (or a portal upload) accepts your
submission.tar.gz:
- AIcrowd unpacks the artifact into a clean grader container that
pre-installs the runner’s
whestbenchrelease plus the contents of yourrequirements.txt. - The grader runs your estimator against a held-out
MLP suite (same
width,depth,flop_budgetas the public defaults; samen_mlpsorder of magnitude), in an isolated subprocess inside a sandboxed container. No network, no GPU, no access to the local filesystem outsideSetupContext.scratch_dir. - Your
setup()runs once. If it raises, the run is recorded as a failed submission with the traceback surfaced in the AIcrowd UI. predict()is called per MLP. Errors per call are captured but don't kill the run — predictions for that MLP are scored against zeros. Repeated failures will tankadjusted_final_layer_score.- The leaderboard updates with
adjusted_final_layer_scoreonce the run finishes.
If the leaderboard score disagrees with your Stage 4 score by more than a percent or two, the suspects are listed in the FAQ.
If you suspect a grader-side issue (your submission errors out without your local Stage 4 doing so), open a thread on the challenge discussion forum with the submission ID — that's the quickest path to a human.
✅ Expected outcome
| Stage | What you should see | Action if not |
|---|---|---|
| Local Stage 4 score | ≈ leaderboard score within ~1–2% | Check Stage 4 vs Stage 3 first — drift between them surfaces the same bugs that the grader will hit |
submission.tar.gz size | Typically 2–10 KB without external deps; up to ~few MB with bundled wheels | If much larger, audit requirements.txt |
| Grader runtime | A few minutes for the default suite | Slower than that suggests residual_wall_time_s issues — see score-report-fields.md |