Inspection (SAE): Embedding

Sparse autoencoder tools for embedding encoders. Decomposes final-layer activations into sparse features, compares texts at the feature level, traces circuits, and measures dictionary health. Use sae-stats for cross-layer probe heatmaps. Requires embedding mode plus pulled embedding SAE checkpoints.

Prerequisiteaquin session start --id my-run --model gte-small · aquin load sae gte-small-l11

12 commands

aquin sae-stats

agent tool: run_sae_stats

Same as LLM sae-stats but runs over embedding encoder layers. Requires embed SAE checkpoints (e.g. gte-small-l0 … l11). Exports layer profile, top features, and probe×layer heatmap for visualization tooling.

Flag	Description
--prompts*	JSON/JSONL probe file.
--layers	all or comma-separated encoder layers with embed SAE checkpoints.
--topk	Top features per layer (default: 10).
--save	Optional JSON export path.
--check	Save sae-stats-check.json and sae-stats-check.png in the current directory.
--output	json for machine-readable stdout.

example

Probe schema: [{"id":"p1","text":"..."}]. Only layers with pulled embed SAE checkpoints (aquin load sae <model>-l<n>) are included.

aquin sae-features

agent tool: run_embed_sae_features

Runs text through the encoder and SAE encoder, returns the top-k active sparse features with activation strengths. Entry point for understanding what concepts the embedding contains.

Flag	Description
--text*	Input text.
--topk	Number of features to return (default: 10).
--check	Save sae-features-check.json and sae-features-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin sae-contrastive

agent tool: run_embed_sae_contrastive

Compares two texts at the SAE feature level. Returns features with the largest activation delta: what the encoder represents differently between the two inputs.

Flag	Description
--text_a*	First text.
--text_b*	Second text.
--topk	Top diverging features to report.
--corpus	Optional corpus for feature labeling.
--check	Save sae-contrastive-check.json and sae-contrastive-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin sae-interp

agent tool: run_embed_sae_interp_score

Scores the interpretability of one SAE feature over a corpus: how consistently it fires on semantically related vs unrelated texts.

Flag	Description
--feature_idx*	Feature index.
--corpus*	JSON array of corpus strings.
--n_samples	Samples per scoring pass.
--check	Save sae-interp-check.json and sae-interp-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin sae-browser

agent tool: run_embed_sae_browser

Browses the most frequently active SAE features across a corpus. Surfaces the dominant concepts the encoder uses for that text collection.

Flag	Description
--corpus*	JSON array of strings.
--top_n_features	Features to list.
--check	Save sae-browser-check.json and sae-browser-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin sae-graph

agent tool: run_embed_sae_network_graph

Builds a co-activation graph: nodes are SAE features, edges connect features that fire together above a threshold. Reveals feature communities in the dictionary.

Flag	Description
--corpus*	JSON array of strings.
--threshold	Co-activation threshold.
--top_n_features	Limit graph to top-N active features.
--check	Save sae-graph-check.json and sae-graph-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin sae-circuit

agent tool: run_embed_sae_circuit

Traces how one target SAE feature's activation builds up layer-by-layer through the encoder. Shows where in the stack the concept first appears and how it strengthens.

Flag	Description
--text*	Input text.
--target_feature_idx*	Feature to trace.
--check	Save sae-circuit-check.json and sae-circuit-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin sae-steer

agent tool: run_embed_sae_steer

Boosts or suppresses one SAE feature activation and measures cosine shift in the output embedding. Optionally re-ranks a corpus to show retrieval impact.

Flag	Description
--text*	Input text.
--feature_idx*	Feature to steer.
--delta*	Activation delta (positive = boost, negative = suppress).
--corpus	Corpus for retrieval re-ranking after steer.
--topk-retrieval	Top-k for retrieval comparison.

example

aquin sae-absorption

agent tool: run_embed_sae_absorption

Scans for feature absorption pairs (one feature's decoder absorbed into another) and near-duplicate decoder directions. Flags dictionary redundancy.

Flag	Description
--corpus*	JSON array of strings.
--top_n	Top features to scan.
--check	Save sae-absorption-check.json and sae-absorption-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin sae-polysemy

agent tool: run_embed_sae_polysemy

Finds features that fire strongly on semantically unrelated sentences: polysemous or entangled features that hurt interpretability.

Flag	Description
--corpus*	JSON array of strings.
--top_n	Top features to analyze.
--check	Save sae-polysemy-check.json and sae-polysemy-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin sae-faithfulness

agent tool: run_embed_sae_retrieval_faithfulness

Ablates SAE features one at a time and measures NDCG drop on a query set. Identifies which features are load-bearing for retrieval quality.

Flag	Description
--queries*	JSON array of query strings.
--corpus*	JSON array of document strings.
--topk	Retrieval top-k.
--n_features_to_test	How many top features to ablate.
--check	Save sae-faithfulness-check.json and sae-faithfulness-check.png in the current directory.
--output json	Print raw JSON to stdout.

example

aquin space-decomp

agent tool: run_embed_space_decomposition

Decomposes a set of texts into their dominant shared SAE features: which concepts span the whole collection vs which are text-specific.

Flag	Description
--texts*	JSON array of strings.
--top_n	Dominant features to report.
--check	Save space-decomp-check.json and space-decomp-check.png in the current directory.
--output json	Print raw JSON to stdout.

example