Symmetry Detection Deep Dive
Contributor-level walkthrough of flopscope's symmetry detection algorithm.
You will learn:
- How the bipartite graph is constructed from einsum subscripts
- How the subset-keyed oracle detects and caches symmetry lazily
- How the sigma-loop derives label permutations from row permutations
- How Burnside's lemma counts unique elements for exact FLOP reduction
TL;DR: flopscope detects when an einsum expression has symmetry that allows computing fewer FLOPs. It does this by building a bipartite graph from the einsum subscripts, finding column permutations that preserve the graph structure, and using group theory to count how many unique terms exist. The savings can be dramatic -- a symmetric matrix multiplication can cost half as many FLOPs.
The problem
A multi-operand einsum like 'ij,ai,bj->ab' is decomposed by opt_einsum into
a sequence of pairwise contractions. At each step the optimizer must evaluate
candidate pairs -- and it needs to know, for each candidate intermediate, whether
the result is symmetric, so it can score it with a reduced cost.
When operands are SymmetricTensors, their per-operand symmetry is known
upfront. But there is a second source of symmetry: when the same Python object
appears at multiple operand positions, the output can be symmetric in index
labels contributed by those repeated operands -- even if the operands are dense.
The naive approach is to rerun a detection procedure at every step for every candidate subset. This is too expensive for large contractions. We want:
- Correctness -- detect all exploitable symmetry without false positives.
- Memoization -- compute each intermediate's symmetry at most once.
- Laziness -- only evaluate subsets that the path optimizer actually visits.
Subgraph symmetry detection achieves all three.
The bipartite graph
The core data structure is a bipartite graph over the einsum expression.
Left vertices (U): One U-vertex per axis of each operand. For a dense
operand with subscript "ai", each axis produces its own U-vertex (two total).
For a SymmetricTensor with subscript "ij" and declared symmetry S_2{i,j},
both axes still produce separate U-vertices -- per-operand symmetry does not
affect the graph topology. Instead, per-operand symmetry is handled entirely
by the expanded sigma-loop (see below), which uses the declared symmetry
generators as an additional source of row permutations.
Right vertices (labels): One right vertex per unique index label. Labels are partitioned into:
- V (free labels): appear in the final output subscript or in operands outside the current subset (they "cross the cut").
- W (summed labels): contracted entirely within the current subset.
Incidence: An edge from U-vertex u to label c has weight equal to the
multiplicity of c in the axes belonging to U-vertex u.
Identical-operand groups: Operands that are the same Python object are grouped. These groups are the source of induced symmetry.
Worked example
Consider 'ij,ai,bj->ab' with operands T, A, B where T is a dense tensor:
Subscripts: ij, ai, bj -> ab
Operands: T A BU-vertices (one per axis):
(T, 0)-- label set {i}(T, 1)-- label set {j}(A, 0)-- label set {a}(A, 1)-- label set {i}(B, 0)-- label set {b}(B, 1)-- label set {j}
Free labels at the top level: {a, b} (appear in output ->ab).
Summed labels at the top level: {i, j} (contracted out).
No identical operands in this example -- T, A, and B are distinct Python
objects.
Full bipartite graph
U (axes) Labels
----------------- ------
V (free):
(A, 0) ------------------------- a
(B, 0) ------------------------- b
W (summed):
(T, 0) ----------+
+--------------- i
(A, 1) ----------+
(T, 1) ----------+
+--------------- j
(B, 1) ----------+Now consider the subset {A, B} (positions 1 and 2):
- U-vertices in subset:
(A, 0),(A, 1),(B, 0),(B, 1) - Labels in subset: {a, i, b, j}
- Labels outside subset (in T): {i, j}
- Crossing labels (in subset AND in outside): {i, j}
- V at this step = {a, b} + {i, j} = {a, b, i, j} (all four -- {i,j} cross the cut)
- W at this step = {} (nothing is summed entirely within {A, B})
Induced subgraph for subset {A, B}
When we restrict to subset {A, B}, labels i and j cross the cut (they also
appear in T, outside the subset), so they move from W to V:
U (subset {A, B} only) Labels
---------------------- ------
V (all free):
(A, 0) ------------------------- a
(A, 1) ------------------------- i
(B, 0) ------------------------- b
(B, 1) ------------------------- j
W: (empty)The incidence matrix M at this subset (rows = U-vertices, columns = V+W):
a i b j
(A, 0): 1 0 0 0
(A, 1): 0 1 0 0
(B, 0): 0 0 1 0
(B, 1): 0 0 0 1The subset-keyed oracle
The key invariant is the pure-in-subset property: the symmetry of an intermediate tensor depends only on the set of original operands it was formed from, not on the order in which they were contracted. This is because:
- The bipartite graph structure is fixed for the full einsum.
- The induced subgraph on a subset
Sis fully determined by which operands are inS. - Symmetry is a property of the final intermediate, not its contraction history.
This property makes the subset key canonical. The oracle stores results in a
dict[frozenset[int], SubsetSymmetry] and returns cached results on
subsequent calls with the same subset.
from flopscope._opt_einsum._subgraph_symmetry import SubgraphSymmetryOracle
# One oracle per contract_path call
oracle = SubgraphSymmetryOracle(
operands=list(operands),
subscript_parts=input_parts,
per_op_groups=perm_groups,
output_chars=output_str,
)
# Lazy evaluation -- only computed on first access per subset
result = oracle.sym(frozenset({0, 1})) # SubsetSymmetry for intermediate from ops 0 and 1
result.output # V-side (output tensor) symmetry
result.inner # W-side (inner summation) symmetryThe detection algorithm
Goal
For a fixed subset S with incidence matrix M, we want the full group
of automorphisms of the labelled bipartite graph -- pairs (sigma, pi)
where sigma permutes identical-operand rows and pi permutes label columns,
such that applying pi to the columns of sigma(M) recovers M:
pi(sigma(M)) = MEvery such pi is a symmetry of the intermediate tensor built from S.
Restricted to V labels it contributes to the output (V-side) symmetry;
restricted to W labels it contributes to the inner (W-side) symmetry.
The V/W partition is part of the labelled structure, so legitimate
automorphisms must preserve it -- pi(V) is a subset of V and pi(W) is a subset of W -- and any
pi with a cycle crossing V to W is rejected.
Column fingerprints
For each label c, compute its column fingerprint col(c) -- the tuple of
incidence values down the rows of M. Labels with identical fingerprints are
candidates for symmetry equivalence. The fingerprint-to-label mapping is
used by the sigma-loop to derive pi in O(1) per label via hash lookup.
Earlier versions had a standalone fast path that detected S_k whenever labels shared a fingerprint, without running the sigma-loop. This was incorrect for non-S_k groups (see the C3 bug note below) and has been removed. Fingerprints are now used only for pi derivation inside the sigma-loop -- they are not a standalone detection mechanism.
Sigma loop: derive pi from generators
The sigma-loop iterates over generators of the row-permutation group on M, drawn from three sources:
-
Source A -- per-operand internal symmetry generators. For each operand that carries a declared
SymmetryGroup, each generator of that group is lifted to a row permutation on M (permuting only the rows belonging to that operand). This captures symmetry that was previously handled by orbit-based axis merging. -
Source B -- identical-operand swap generators. For each group of k identical operands (same Python object), the k-1 adjacent transpositions
(op_i, op_{i+1})are used as generators. Each such swap is lifted to a row permutation that exchanges the rows of the two operands. -
Source C -- coordinated axis relabeling. When identical operands share the same subscript pattern (e.g. both have subscript
ij), permuting axes uniformly across all copies is equivalent to relabeling dummy indices. Adjacent transpositions on W-only (summed) axes are generated, applied to every copy simultaneously. This is restricted to W-side labels because relabeling free (output) labels would change the output tensor.
All generators are collected and passed to Dimino's algorithm to build the full row-permutation group. The sigma-loop then iterates over all elements of this group (not just the generators), deriving pi for each.
For each group element sigma:
- Lift
sigmato a row permutation on M. - Compute
sigma(M)'s column fingerprints:sigma * col(c)for each labelc. - Derive the induced label permutation pi directly. For each label
l,pi(l)is the label whose M-column matchessigma(M)'s column forl-- a hash-table lookup in O(1). When multiple labels share a fingerprint (collision), pick the lex-first unused candidate. If any label has no match, reject this sigma. - Validate pi:
pi(V) is a subset of Vandpi(W) is a subset of W. Any cycle crossing V to W invalidates the sigma. - Collect pi as a generator literal restricted to V labels (and
separately to W labels). Non-identity generators become part of the
detected
SymmetryGroup.
The sigma-loop collects all non-identity pi restrictions as generator literals. These generators are passed to Dimino's algorithm to close the group and build the exact symmetry group on V (and separately on W).
Interactive explorer
Walk through each detection step interactively with the Symmetry Explorer — choose a preset example or define your own einsum expression.
Worked examples
Click to expand each example.
V-side and W-side
V-side groups are symmetries of the output tensor -- they reduce the number of unique output elements that need to be computed. W-side groups are symmetries among the contracted (summed) labels -- they reduce the number of unique summation terms. Both contribute multiplicatively to the cost reduction:
cost = dense_cost * (unique_output / total_output) * (unique_inner / total_inner)The output (V-side) reduction is always applied when the step's intermediate has a non-trivial permutation group on its free indices.
The inner (W-side) reduction is applied only when all labels in
the detected inner group are present as contracted indices in that
specific pairwise step. If any of those labels were contracted at an
earlier step and no longer appear, the inner reduction is skipped.
In the contraction path table, [W checkmark: ...] indicates the inner reduction was
applied, while [W: ...] indicates it was detected but not applied.
Inner symmetry can be toggled with flops.configure(use_inner_symmetry=False).
Exact group detection and Burnside counting
The sigma-loop collects all valid pi permutations as generator literals and builds a SymmetryGroup directly.
When the generated group equals S_k (the full symmetric group, checked via
order == k!), the existing stars-and-bars formula C(n+k-1, k) applies.
When the group is a proper subgroup (e.g., C_3 from einsum('ij,jk,ki->', A, A, A)),
Burnside's lemma gives the exact unique element count.
Complexity bound
The oracle evaluates each subset at most once. For a contract with N operands
and groups of sizes k_1, k_2, ...:
- Generator collection: Source A contributes O(rank) generators per operand
with declared symmetry. Source B contributes k-1 generators per identical group.
Source C contributes at most rank-1 generators per identical group with matching
subscripts. Total generators:
g = O(N * rank). - Group enumeration: Dimino's algorithm builds the full row-permutation
group from the generators in
O(|G| * g)compositions. - Pi derivation: For each of the
|G|group elements, deriving pi costsO(n_labels)via fingerprint hash lookup. - Per-subset total:
O(|G| * (g + n_labels)). - Number of subsets visited: at most
2^N(usually much less -- path algorithms visit only O(N^2) subsets in practice).
For the common case of a single pair of identical operands (|G| = 2, g = 1):
per-subset cost is O(n_labels).
Related pages
- Symmetry Savings -- practical guide to using symmetry
- FLOP Counting Model -- how costs are calculated