Checkpoint SAE diff, weight diff & residual drift
Post-training interpretability on real fine-tuned checkpoints. Compare base vs checkpoint activations through the public SAE (aquin load sae), analyze per-layer weight deltas (aquin weight-diff), activation drift (aquin residual-drift), pre-merge LoRA gates (aquin merge-analysis), and multi-checkpoint trajectories (aquin trajectory-analysis). Works for LLM (token-mean residual) and embedding (mean-pooled hidden state) models. For activation capture, temp SAE training, and alignment, see SAE training (/docs/sae-training). Requires GPU, aquin session start --model, and aquin load sae for web mirror cards.
6 commands
aquin weight-diff
agent tool: run_weight_diff
Per-layer weight delta between the catalog base model and a fine-tuned checkpoint. LLMs report Q/K/V/O/MLP ‖ΔW‖ and stable rank of the change (full weights via TransformerLens, or LoRA effective B@A when the checkpoint contains adapter keys). Embedding models diff HF encoder weights grouped by layer. Syncs a weightDiff card to the web orchestrator.
| Flag | Description |
|---|---|
| --checkpoint* | Path to merged .pt checkpoint or HF save_pretrained directory. |
| --name | Label for checkpoint in output and web card (default: filename stem). |
| --save | Write schema_version=1 JSON export. |
| --output json | Print raw JSON to stdout. |
Uses the loaded session model as base. No --model override.
aquin merge-analysis
agent tool: run_merge_analysis
Pre-merge LoRA gate: runs weight-diff, flags matrices with elevated stable rank (rank/collapse signals), and optionally runs behavioral model-diff on probe generations (LLM only). Returns mergeVerdict pass | warn | fail. Exit code 2 on fail. Syncs a mergeAnalysis card.
| Flag | Description |
|---|---|
| --checkpoint* | Path to adapter or merged .pt checkpoint. |
| --name | Label for checkpoint in output and web card. |
| --prompts | JSON array or JSONL probes for behavioral diff (LLM). |
| --no-behavioral | Skip generation-based behavioral scores (faster). |
| --save | Write schema_version=1 JSON export. |
| --output json | Print raw JSON to stdout. |
Embedding mode: weight-only verdict (no behavioral diff). Uses loaded session model as base.
aquin trajectory-analysis
agent tool: run_trajectory_analysis
Training trajectory: weight-diff summary for each checkpoint vs base, sorted by training step when present in the .pt file. Use --checkpoints <glob> or --dir to scan a checkpoint folder. Syncs a trajectoryAnalysis card with a step table and sparkline.
| Flag | Description |
|---|---|
| --checkpoints | Glob of .pt checkpoints (e.g. ~/runs/checkpoints/step_*.pt). |
| --dir | Recursively scan directory for *.pt files. |
| --name | Optional prefix for step labels. |
| --save | Write schema_version=1 JSON export. |
| --output json | Print raw JSON to stdout. |
Provide exactly one of --checkpoints or --dir. Uses loaded session model as base.
aquin residual-drift
agent tool: run_residual_drift
Per-layer activation drift between the catalog base model and a fine-tuned checkpoint on the same probe set. LLMs compare last-token hook_resid_post cosine distance per layer; embedding models compare mean-pooled hidden states per encoder layer. Complements weight-diff (weight space) and sae diff (sparse feature space). Syncs a residualDrift card to the web orchestrator.
| Flag | Description |
|---|---|
| --checkpoint* | Path to merged .pt checkpoint or HF save_pretrained directory. |
| --prompts | JSON array or JSONL probe strings (default: built-in short prompts). |
| --name | Label for checkpoint in output and web card (default: filename stem). |
| --save | Write schema_version=1 JSON export. |
| --output json | Print raw JSON to stdout. |
Uses the loaded session model id for catalog base weights. No --model override.
aquin sae diff
agent tool: run_sae_diff
Load the catalog base model and a fine-tuned checkpoint, run the same prompts through the public SAE, and report per-feature activation deltas (changed count, mean/max |Δ|, top features). LLMs use TransformerLens residuals; embedding models use mean-pooled layer activations (default layer 11). Syncs a saeDiff card to the web orchestrator.
| Flag | Description |
|---|---|
| --model* | Catalog model slug (e.g. llama-3.2-1b). |
| --checkpoint* | Path to merged .pt checkpoint or HF save_pretrained directory. |
| --prompts | JSON array or JSONL of probe strings / {instruction, response} rows. |
| --layer | SAE layer (default: from model config). |
| --sae | Custom SAE weights path instead of pulled public SAE. |
| --name | Label for checkpoint in output and web card (default: checkpoint filename). |
| --output | Write full JSON payload to disk. |
Checkpoint format: { step, state_dict } from run.checkpoint() or fixtures/e2e/scripts/train_lora_e2e.py. Starts local engine server on localhost:8002 like other GPU tools.
aquin simulate (saeDiff)
agent tool: run_simulation
At the end of aquin simulate, the pipeline runs an SAE diff between base and the NTK-linearized synthetic checkpoint. Stream logs [simulate] SAE diff: … with nChanged / meanAbsDelta. See Simulation (LLM) for full simulate flags.
Synthetic checkpoint — not the same as sae diff on a real LoRA checkpoint. See /docs/simulation/llm.
Typical workflow
After fine-tuning (your trainer + run.checkpoint(), or the E2E fixture train_lora_e2e.py), capture probes → diff → temp train → align. Activation capture and sae train live under SAE training.
Web mirror
Each command pushes tool.start / tool.result to your session. Cards:
- sae diff — changed features, top deltas, base vs FT table
- weight diff — per-layer ‖ΔW‖, matrix-type breakdown, top changed weights
- merge analysis — pre-merge verdict, rank/collapse signals, behavioral scores (LLM)
- trajectory analysis — multi-checkpoint ‖ΔW‖ over training steps
- residual drift — per-layer cosine distance on activations (last-token resid or pooled hidden)
- sae train — layer, quick/full, output path
- sae align — mean cosine, weakest/strongest decoder matches
Collapsible sync row shows Invoked / Completed; full JSON is slimmed in sync payload (details live on the card).
vs simulate & watch
| aquin sae diff | aquin simulate | aquin watch | |
|---|---|---|---|
| Checkpoint | Real merged .pt from training | Synthetic NTK-linearized weights | No weights — metrics JSONL only |
| GPU | Required | Required | Not required |
| Web card | saeDiff | Simulation + saeDiff in stream | training.watch.* |
