Aquin LogoAquinLabs
Login

Inspection (SAE): LLMs

Sparse autoencoder tools for language models. Decomposes residual-stream activations into interpretable features, runs full attribution pipelines, steers features at inference time, and benchmarks feature quality. Use sae-stats for multi-layer batch exports and degradation heatmaps. After fine-tuning, use Checkpoint SAE (/docs/checkpoint-sae) for diff, temp train, and align on real checkpoints.

Prerequisiteaquin session start --id my-run --model gpt2-small · aquin load sae gpt2-small-l8

8 commands

aquin inspect

agent tool: run_full_inspection

Full attribution pipeline in one pass: generates the model response, runs causal mediation analysis per token/layer, decomposes top SAE features, renders logit lens per layer, and builds the circuit graph. This is the primary LLM inspection entry point. Syncs the complete inspection card to the web.

FlagDescription
--prompt*Input prompt (model completes from here).
--layer*SAE layer with a downloaded checkpoint (aquin load sae <model>-l<n>). If omitted, prints layers on disk and pull commands.
--checkSave inspect-check.json and inspect-check.png in the current directory.
--modelOverride active model.
example

CLI-only shortcut. In aquin chat, ask: Run full inspection on "…"

aquin feature-logits

agent tool: get_feature_logits

Projects one SAE feature's decoder direction through the unembedding matrix and returns the top tokens promoted and suppressed. Answers: if I amplify this feature, which tokens become more likely?

FlagDescription
--feature*SAE feature index.
--topkNumber of tokens to show (default: 10).
--promptOptional prompt context for activation weighting.
--checkSave feature-logits-check.json and feature-logits-check.png in the current directory.
example

aquin feature-neighbors

agent tool: get_feature_neighbors

Returns the nearest neighbor SAE features by decoder cosine similarity. Use to find redundant, related, or polysemous features near a target index.

FlagDescription
--feature*SAE feature index.
--topkNumber of neighbors (default: 8).
--checkSave feature-neighbors-check.json and feature-neighbors-check.png in the current directory.
example

aquin steer

agent tool: run_steer_and_show

Adds a scaled multiple of one SAE feature's decoder direction to the residual stream at the SAE layer during the forward pass. Compares baseline vs steered output side-by-side. Feature label auto-resolves from causal labeling if omitted.

FlagDescription
--prompt*Input prompt.
--feature_idx*SAE feature to steer.
--feature_labelHuman-readable label (auto-resolved if omitted).
--strengthSteering multiplier (default: 1.0).
--max_new_tokensGeneration length.
example

aquin multi-steer

agent tool: run_multi_steer

Steers multiple SAE features simultaneously in a single forward pass. Pass a JSON array of {feature_idx, strength, label} objects.

FlagDescription
--prompt*Input prompt.
--features*JSON array, e.g. '[{"feature_idx":42,"strength":1.5}]'
--max_new_tokensGeneration length.
example

aquin benchmark

agent tool: run_benchmarks_on_top_feature

Runs InterpScore, Feature Purity, and MUI (Monosemanticity Under Intervention) on a single SAE feature. Quantifies how interpretable and causally coherent the feature is.

FlagDescription
--feature_idx*SAE feature index.
--checkSave benchmark-check.json and benchmark-check.png in the current directory.
example

Distinct from behavioral evals (audit, red-team). This scores one feature's interpretability.

aquin sae-stats

agent tool: run_sae_stats

Batch export of multi-layer SAE statistics over a probe dataset. Computes per-layer mean L0 sparsity, top-k firing features, and a probe×layer heatmap for degradation profiles and stressor comparisons. Works for any layer with a pulled SAE checkpoint.

FlagDescription
--prompts*JSON/JSONL probe file. Each row: text (or prompt) plus optional id, stressor, lang, quant_run_id.
--layersLayer spec: all (default) or comma-separated, e.g. 9 or 0,9,15.
--topkTop features per layer (default: 10).
--saveWrite full schema_version=1 JSON export to this path.
--checkSave sae-stats-check.json and sae-stats-check.png in the current directory.
--outputjson for machine-readable stdout.
example

Probe schema: [{"id":"h1","text":"...","stressor":"baseline","lang":"en"}]. Export schema_version=1 JSON: layer_stats (per-layer top_features), layer_profile (mean_l0/sparsity), heatmap (probe×layer). Syncs SaeStatsCard to the web.

aquin umap

agent tool: ensure_umap_loaded

Loads the UMAP projection of all SAE decoder directions and opens the UMAP Explorer panel on the web. Requires a precomputed UMAP file for the pulled SAE.

example