flopscope.
Development

Contributor Guide

Use this page when you are working on the flopscope repository itself rather than only consuming the published API.

You will learn:

  • How the repository is organized across three packages
  • How to set up your development environment and run tests
  • How to work with client, server, and Docker workflows
  • How auto-generated documentation is maintained

Repository layout

This repository contains three Python packages plus docs and Docker assets:

PathPurpose
src/flopscope/Core library backed by NumPy
flopscope-client/src/flopscope/Client proxy used in sandboxed participant environments
flopscope-server/src/flopscope_server/ZMQ server that executes the real library
tests/Core library test suite
flopscope-client/tests/Client unit, integration, and adversarial tests
flopscope-server/tests/Server unit tests
website/content/docs/Docs source for the published site
website/public/ops.jsonGenerated slim API operation index consumed by /docs/api
website/public/api-data/ops/*.jsonGenerated per-operation detail payloads for canonical operation pages
website/.generated/public-api-routes.jsonGenerated canonical route manifest for /docs/api/... pages
website/.generated/op-doc-imports.tsGenerated static import map for operation docs
website/.generated/symbol-doc-imports.tsGenerated static import map for public helper and object docs
website/.generated/public-api-symbols.jsonGenerated manifest of non-registry public API pages
scripts/generate_api_docs.pyRegenerates API route manifests, per-operation payloads, and public symbol docs
docker/Local client-server and hardened evaluation images

Initial setup

For normal work on the core package, docs, and root test suite:

git clone https://github.com/AIcrowd/flopscope.git
cd flopscope
make install

make install runs uv sync --all-extras and configures the local git hooks.

Which environment to use

The root environment covers the core package, linting, docs, and the main test suite. The client and server each also have their own pyproject.toml.

One important caveat: flopscope-server depends on the local flopscope package, which is not resolved from a package index in a fresh source checkout. For server development, run commands from the repository root with PYTHONPATH=src:flopscope-server/src instead of relying on cd flopscope-server && uv run ....

Common commands

Core library

make lint
make test
make test-numpy-compat
make docs-build
make docs-serve
make ci

If you prefer direct uv commands:

uv run pytest
uv run mkdocs serve

When running the local docs site and you want flopscope error messages to link to your local copy instead of the hosted site, set:

export FLOPSCOPE_DOCS_ROOT=http://localhost:3000/docs

If FLOPSCOPE_DOCS_ROOT is unset, flopscope falls back to the hosted docs at https://aicrowd.github.io/flopscope/docs.

Client package

The client package is independently installable, so its test suite can run via its own project file:

uv run --project flopscope-client pytest flopscope-client/tests

Client integration and adversarial tests start a real server subprocess using the repository root .venv/bin/python, so run make install first.

Server package

Run server tests from the repository root so the local core package is on PYTHONPATH:

PYTHONPATH=src:flopscope-server/src \
  uv run --with pyzmq --with msgpack pytest flopscope-server/tests

To launch the server manually from a source checkout:

PYTHONPATH=src:flopscope-server/src \
  uv run --with pyzmq --with msgpack \
  python -m flopscope_server --url ipc:///tmp/flopscope.sock

Running client and server together without Docker

From a source checkout, use repo-root commands so both packages resolve correctly:

# Terminal 1
PYTHONPATH=src:flopscope-server/src \
  uv run --with pyzmq --with msgpack \
  python -m flopscope_server --url ipc:///tmp/flopscope.sock
# Terminal 2
export FLOPSCOPE_SERVER_URL=ipc:///tmp/flopscope.sock
PYTHONPATH=flopscope-client/src \
  uv run --with pyzmq --with msgpack python your_script.py

See Running with Docker if you want the same split using containers.

Generated documentation

Do not hand-edit website/public/ops.json, website/public/api-data/ops/*.json, website/.generated/public-api-routes.json, website/.generated/op-doc-imports.ts, website/.generated/symbol-doc-imports.ts, or website/.generated/public-api-symbols.json. The interactive API reference, canonical API pages, and legacy redirect routes consume those generated artifacts directly.

Instead, update scripts/generate_api_docs.py, the relevant source docstrings, or the operation registry, then regenerate and verify:

uv run python scripts/generate_api_docs.py
uv run python scripts/generate_api_docs.py --verify

NumPy Compatibility Testing

flopscope's goal is NumPy API compatibility on the counted surface: import flopscope.numpy as np should work for supported functions. To verify this, we run NumPy's own test suite against flopscope.

How it works

A pytest conftest at tests/numpy_compat/conftest.py monkeypatches numpy functions with their flopscope equivalents at session start. When we point pytest at NumPy's installed test files using --pyargs, every test that calls np.sum(...), np.mean(...), etc. actually calls flopscope's version.

NumPy test file                conftest.py               flopscope
  calls np.sum(x)  ──────>   np.sum = fnp.sum  ──────>  fnp.sum(x)
  asserts result              (monkeypatch)              (FLOP-counted)

Avoiding infinite recursion

flopscope functions internally call numpy (for example, fnp.dot eventually delegates to _np.dot inside the implementation modules). Since _np is the numpy module, patching numpy.dot = fnp.dot without isolating those backend references would cause infinite recursion: fnp.dot_np.dotnumpy.dotfnp.dot → ...

We solve this by freezing numpy before patching: the conftest creates a snapshot of the numpy module (and its submodules like numpy.linalg, numpy.fft), then rebinds every flopscope module's _np reference to the frozen copy. Now flopscope's internal calls go to the original numpy functions, while the test suite sees flopscope's versions.

# Simplified flow in conftest.py:
frozen_np = freeze_numpy()           # snapshot of original numpy
rebind_flopscope_np(frozen_np)       # flopscope internals → frozen copy
patch_numpy()                        # np.sum = fnp.sum, etc.
# Now: test calls np.sum → fnp.sum → frozen_np.sum (original) ✓

What gets patched

Of flopscope's 508 registered functions, most non-ufunc functions are patched onto numpy during testing. The only categories skipped:

CategoryCountWhy skipped
Ufuncs101flopscope functions are plain callables, not ufuncs -- they lack .reduce, .accumulate, .outer, .nargs. Tests check these attributes at collection time.
Blacklisted32Intentionally unsupported
linalg.outer1fnp.linalg.outer delegates to np.outer (not np.linalg.outer), which has different validation behavior

Everything else -- free ops, counted custom ops (dot, einsum, etc.), submodule functions (linalg, fft), reductions, and special functions -- is patched.

Test suites

We run 7 NumPy test modules covering core math, ufuncs, numerics, linear algebra, FFT, polynomials, and random:

SuiteModulePassedxfailed
Core mathnumpy._core.tests.test_umath4,66813
Ufunc infrastructurenumpy._core.tests.test_ufunc7957
Numeric operationsnumpy._core.tests.test_numeric1,56020
Linear algebranumpy.linalg.tests.test_linalg48255
FFTnumpy.fft.tests.test_pocketfft11434
Polynomialsnumpy.polynomial.tests.test_polynomial362
Randomnumpy.random.tests.test_random1420
Total7,363331

All failures are tracked as xfails in tests/numpy_compat/xfails.py.

Running the tests

Tests use pytest-xdist for parallel execution across all CPU cores.

# Run everything (recommended)
make test-numpy-compat

# Run a single suite
uv run pytest tests/numpy_compat/ --pyargs numpy._core.tests.test_umath -n auto -q

# Filter to specific functions
uv run pytest tests/numpy_compat/ --pyargs numpy._core.tests.test_umath -k "sqrt" -n auto -v

# Run without parallelism (for debugging)
uv run pytest tests/numpy_compat/ --pyargs numpy._core.tests.test_umath -v --tb=short

The numpy_compat tests are excluded from the default pytest run (via pyproject.toml addopts) to prevent the monkeypatch from contaminating the main test suite. They run as a separate step in CI.

Known divergences (xfails)

Tests that fail due to known, accepted differences are tracked in tests/numpy_compat/xfails.py. Each entry maps a test pattern to a categorized reason:

CategoryMeaningExamples
NOT_IMPLEMENTEDFunction exists but lacks a kwarg or edge caseMissing out=, where=, subok= kwargs
UNSUPPORTED_DTYPEflopscope doesn't support this dtypetimedelta, object arrays
UFUNC_INTERNALSTest relies on ufunc protocol.reduce, __array_ufunc__
BUDGET_SIDE_EFFECTTest assumes no global state changesBudget deduction during assertions
NUMPY_INTERNALTest uses numpy internals_umath_tests, internal type tables

The linalg suite has the most xfails (255) because flopscope's linalg wrappers don't support stacked/batched arrays, 0-size arrays, or some advanced kwargs that numpy's linalg tests exercise extensively.

Triaging new failures

  1. Run a suite: uv run pytest tests/numpy_compat/ --pyargs <module> -n auto --tb=line
  2. Categorize each failure
  3. If it's a bug we should fix, create an issue
  4. If it's an accepted divergence, add it to xfails.py

Why monkeypatching (not subclassing)

We considered alternatives:

  • Array subclass with __array_ufunc__: Would intercept ufunc calls, but flopscope arrays are plain numpy.ndarray by design -- no custom tensor class.
  • Running tests with import flopscope as np: NumPy's test files import from numpy._core, numpy.testing, etc. -- can't redirect all internal imports.
  • Monkeypatching with frozen numpy: Simple, works with NumPy's existing test infrastructure, tests exactly what users experience (same function signatures), and the frozen-numpy trick prevents infinite recursion.

On this page