whestbench.
CLI

whest dataset

Dataset bake/publish/load/merge/inspect commands.

whest dataset

Dataset bake/publish/load/merge/inspect commands.

whest dataset [options]

whest dataset bake

Bake a new dataset to a directory.

OptionDefaultDescription
--n-mlpsTotal number of MLPs in the logical dataset.
--n-samples
--width
--depth
--mlp-seedsPath to a JSON file containing an array of N explicit per-MLP seeds (each a non-negative int < 2**63). If omitted, auto-generate via secrets.randbits(63). See docs/reference/dataset-format.md.
--split'public'Split name. Must match [a-z][a-z0-9-]* (HF Hub split-name convention).
--config'default'HF dataset config name for this split. Defaults to 'default'. Use this when authoring config-per-split datasets.
--outputOutput directory (must not exist).
--torchUse GPU/torch backend.
--device'auto'
--mlps-per-batch
--chunk-size
--sliceK/N — this slice K of N (0-indexed).
--mlp-rangeSTART-END (inclusive on both ends), e.g. 0-249.

whest dataset upload

Upload a baked dataset to HF Hub.

OptionDefaultDescription
local_dir
--repoHF repo id (org/name).
--tagOptional git tag (e.g. v1).
--private
--token
--message

whest dataset push

OptionDefaultDescription
local_dir
--repoHF repo id (org/name).
--tagOptional git tag (e.g. v1).
--private
--token
--message

whest dataset download

Download a dataset from HF Hub.

OptionDefaultDescription
repo_id
--revision
--output
--token
--splitOptional: download only the specified split's parquet (and metadata/README).

whest dataset pull

OptionDefaultDescription
repo_id
--revision
--output
--token
--splitOptional: download only the specified split's parquet (and metadata/README).

whest dataset merge

Merge partial bakes into one dataset.

OptionDefaultDescription
inputsPartial dataset directories.
--output

whest dataset info

Print dataset metadata.

OptionDefaultDescription
sourceLocal dir or HF repo id.
--revision

whest dataset inspect

OptionDefaultDescription
sourceLocal dir or HF repo id.
--revision

whest dataset combine-splits

Combine N single-split datasets into a multi-split dataset directory.

OptionDefaultDescription
input_dirsOne or more complete single-split dataset directories.
--outputOutput directory (must not exist).
--default-splitOptional name of the split that downstream consumers should fall back to when --split is omitted on a multi-split dataset. Must match one of the input splits. Recorded as 'default_split' in the combined metadata.json and used by whest run.
--skip-prepared-arrowSkip generation of prepared/<split>/ Arrow artifacts. By default combine-splits emits Dataset.save_to_disk() directories for each split so whestbench.load_dataset can memory-map them directly on the consumer side (no parquet→arrow conversion). Skip if the prepare cost outweighs the runtime win for your use.

whest dataset prepare-arrow

Patch an existing multi-split dataset directory with prepared/<split>/ Arrow artifacts so consumers can skip the parquet→arrow conversion on cold cache.

OptionDefaultDescription
dataset_dirPath to an existing multi-split dataset directory (with data/, metadata.json).

On this page