Aquin LogoAquinLabs
Login

Inspection (SAE): Embedding

Sparse autoencoder tools for embedding encoders. Decomposes final-layer activations into sparse features, compares texts at the feature level, traces circuits, and measures dictionary health. Use sae-stats for cross-layer probe heatmaps. Requires embedding mode plus pulled embedding SAE checkpoints.

Prerequisiteaquin session start --id my-run --model gte-small · aquin load sae gte-small-l11

12 commands

aquin sae-stats

agent tool: run_sae_stats

Same as LLM sae-stats but runs over embedding encoder layers. Requires embed SAE checkpoints (e.g. gte-small-l0 … l11). Exports layer profile, top features, and probe×layer heatmap for visualization tooling.

FlagDescription
--prompts*JSON/JSONL probe file.
--layersall or comma-separated encoder layers with embed SAE checkpoints.
--topkTop features per layer (default: 10).
--saveOptional JSON export path.
--checkSave sae-stats-check.json and sae-stats-check.png in the current directory.
--outputjson for machine-readable stdout.
example

Probe schema: [{"id":"p1","text":"..."}]. Only layers with pulled embed SAE checkpoints (aquin load sae <model>-l<n>) are included.

aquin sae-features

agent tool: run_embed_sae_features

Runs text through the encoder and SAE encoder, returns the top-k active sparse features with activation strengths. Entry point for understanding what concepts the embedding contains.

FlagDescription
--text*Input text.
--topkNumber of features to return (default: 10).
--checkSave sae-features-check.json and sae-features-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin sae-contrastive

agent tool: run_embed_sae_contrastive

Compares two texts at the SAE feature level. Returns features with the largest activation delta: what the encoder represents differently between the two inputs.

FlagDescription
--text_a*First text.
--text_b*Second text.
--topkTop diverging features to report.
--corpusOptional corpus for feature labeling.
--checkSave sae-contrastive-check.json and sae-contrastive-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin sae-interp

agent tool: run_embed_sae_interp_score

Scores the interpretability of one SAE feature over a corpus: how consistently it fires on semantically related vs unrelated texts.

FlagDescription
--feature_idx*Feature index.
--corpus*JSON array of corpus strings.
--n_samplesSamples per scoring pass.
--checkSave sae-interp-check.json and sae-interp-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin sae-browser

agent tool: run_embed_sae_browser

Browses the most frequently active SAE features across a corpus. Surfaces the dominant concepts the encoder uses for that text collection.

FlagDescription
--corpus*JSON array of strings.
--top_n_featuresFeatures to list.
--checkSave sae-browser-check.json and sae-browser-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin sae-graph

agent tool: run_embed_sae_network_graph

Builds a co-activation graph: nodes are SAE features, edges connect features that fire together above a threshold. Reveals feature communities in the dictionary.

FlagDescription
--corpus*JSON array of strings.
--thresholdCo-activation threshold.
--top_n_featuresLimit graph to top-N active features.
--checkSave sae-graph-check.json and sae-graph-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin sae-circuit

agent tool: run_embed_sae_circuit

Traces how one target SAE feature's activation builds up layer-by-layer through the encoder. Shows where in the stack the concept first appears and how it strengthens.

FlagDescription
--text*Input text.
--target_feature_idx*Feature to trace.
--checkSave sae-circuit-check.json and sae-circuit-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin sae-steer

agent tool: run_embed_sae_steer

Boosts or suppresses one SAE feature activation and measures cosine shift in the output embedding. Optionally re-ranks a corpus to show retrieval impact.

FlagDescription
--text*Input text.
--feature_idx*Feature to steer.
--delta*Activation delta (positive = boost, negative = suppress).
--corpusCorpus for retrieval re-ranking after steer.
--topk-retrievalTop-k for retrieval comparison.
example

aquin sae-absorption

agent tool: run_embed_sae_absorption

Scans for feature absorption pairs (one feature's decoder absorbed into another) and near-duplicate decoder directions. Flags dictionary redundancy.

FlagDescription
--corpus*JSON array of strings.
--top_nTop features to scan.
--checkSave sae-absorption-check.json and sae-absorption-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin sae-polysemy

agent tool: run_embed_sae_polysemy

Finds features that fire strongly on semantically unrelated sentences: polysemous or entangled features that hurt interpretability.

FlagDescription
--corpus*JSON array of strings.
--top_nTop features to analyze.
--checkSave sae-polysemy-check.json and sae-polysemy-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin sae-faithfulness

agent tool: run_embed_sae_retrieval_faithfulness

Ablates SAE features one at a time and measures NDCG drop on a query set. Identifies which features are load-bearing for retrieval quality.

FlagDescription
--queries*JSON array of query strings.
--corpus*JSON array of document strings.
--topkRetrieval top-k.
--n_features_to_testHow many top features to ablate.
--checkSave sae-faithfulness-check.json and sae-faithfulness-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example

aquin space-decomp

agent tool: run_embed_space_decomposition

Decomposes a set of texts into their dominant shared SAE features: which concepts span the whole collection vs which are text-specific.

FlagDescription
--texts*JSON array of strings.
--top_nDominant features to report.
--checkSave space-decomp-check.json and space-decomp-check.png in the current directory.
--output jsonPrint raw JSON to stdout.
example