Inspection (SAE): LLMs

Sparse autoencoder tools for language models. Decomposes residual-stream activations into interpretable features, runs full attribution pipelines, steers features at inference time, and benchmarks feature quality. Use sae-stats for multi-layer batch exports and degradation heatmaps. After fine-tuning, use Checkpoint SAE (/docs/checkpoint-sae) for diff, temp train, and align on real checkpoints.

Prerequisiteaquin session start --id my-run --model gpt2-small · aquin load sae gpt2-small-l8

8 commands

aquin inspect

agent tool: run_full_inspection

Full attribution pipeline in one pass: generates the model response, runs causal mediation analysis per token/layer, decomposes top SAE features, renders logit lens per layer, and builds the circuit graph. This is the primary LLM inspection entry point. Syncs the complete inspection card to the web.

Flag	Description
--prompt*	Input prompt (model completes from here).
--layer*	SAE layer with a downloaded checkpoint (aquin load sae <model>-l<n>). If omitted, prints layers on disk and pull commands.
--check	Save inspect-check.json and inspect-check.png in the current directory.
--model	Override active model.

example

CLI-only shortcut. In aquin chat, ask: Run full inspection on "…"

aquin feature-logits

agent tool: get_feature_logits

Projects one SAE feature's decoder direction through the unembedding matrix and returns the top tokens promoted and suppressed. Answers: if I amplify this feature, which tokens become more likely?

Flag	Description
--feature*	SAE feature index.
--topk	Number of tokens to show (default: 10).
--prompt	Optional prompt context for activation weighting.
--check	Save feature-logits-check.json and feature-logits-check.png in the current directory.

example

aquin feature-neighbors

agent tool: get_feature_neighbors

Returns the nearest neighbor SAE features by decoder cosine similarity. Use to find redundant, related, or polysemous features near a target index.

Flag	Description
--feature*	SAE feature index.
--topk	Number of neighbors (default: 8).
--check	Save feature-neighbors-check.json and feature-neighbors-check.png in the current directory.

example

aquin steer

agent tool: run_steer_and_show

Adds a scaled multiple of one SAE feature's decoder direction to the residual stream at the SAE layer during the forward pass. Compares baseline vs steered output side-by-side. Feature label auto-resolves from causal labeling if omitted.

Flag	Description
--prompt*	Input prompt.
--feature_idx*	SAE feature to steer.
--feature_label	Human-readable label (auto-resolved if omitted).
--strength	Steering multiplier (default: 1.0).
--max_new_tokens	Generation length.

example

aquin multi-steer

agent tool: run_multi_steer

Steers multiple SAE features simultaneously in a single forward pass. Pass a JSON array of {feature_idx, strength, label} objects.

Flag	Description
--prompt*	Input prompt.
--features*	JSON array, e.g. '[{"feature_idx":42,"strength":1.5}]'
--max_new_tokens	Generation length.

example

aquin benchmark

agent tool: run_benchmarks_on_top_feature

Runs InterpScore, Feature Purity, and MUI (Monosemanticity Under Intervention) on a single SAE feature. Quantifies how interpretable and causally coherent the feature is.

Flag	Description
--feature_idx*	SAE feature index.
--check	Save benchmark-check.json and benchmark-check.png in the current directory.

example

Distinct from behavioral evals (audit, red-team). This scores one feature's interpretability.

aquin sae-stats

agent tool: run_sae_stats

Batch export of multi-layer SAE statistics over a probe dataset. Computes per-layer mean L0 sparsity, top-k firing features, and a probe×layer heatmap for degradation profiles and stressor comparisons. Works for any layer with a pulled SAE checkpoint.

Flag	Description
--prompts*	JSON/JSONL probe file. Each row: text (or prompt) plus optional id, stressor, lang, quant_run_id.
--layers	Layer spec: all (default) or comma-separated, e.g. 9 or 0,9,15.
--topk	Top features per layer (default: 10).
--save	Write full schema_version=1 JSON export to this path.
--check	Save sae-stats-check.json and sae-stats-check.png in the current directory.
--output	json for machine-readable stdout.

example

Probe schema: [{"id":"h1","text":"...","stressor":"baseline","lang":"en"}]. Export schema_version=1 JSON: layer_stats (per-layer top_features), layer_profile (mean_l0/sparsity), heatmap (probe×layer). Syncs SaeStatsCard to the web.

aquin umap

agent tool: ensure_umap_loaded

Loads the UMAP projection of all SAE decoder directions and opens the UMAP Explorer panel on the web. Requires a precomputed UMAP file for the pulled SAE.

example