Aquin LogoAquinLabs
Login

Checkpoint SAE diff, weight diff & residual drift

Post-training interpretability on real fine-tuned checkpoints. Compare base vs checkpoint activations through the public SAE (aquin load sae), analyze per-layer weight deltas (aquin weight-diff), activation drift (aquin residual-drift), pre-merge LoRA gates (aquin merge-analysis), and multi-checkpoint trajectories (aquin trajectory-analysis). Works for LLM (token-mean residual) and embedding (mean-pooled hidden state) models. For activation capture, temp SAE training, and alignment, see SAE training (/docs/sae-training). Requires GPU, aquin session start --model, and aquin load sae for web mirror cards.

PrerequisiteLLM: aquin session start --id my-run --model llama-3.2-1b · aquin load sae llama-3.2-1b-l8 · Embedding: aquin session start --id my-run --model gte-small · aquin load sae gte-small-l11

6 commands

aquin weight-diff

agent tool: run_weight_diff

Per-layer weight delta between the catalog base model and a fine-tuned checkpoint. LLMs report Q/K/V/O/MLP ‖ΔW‖ and stable rank of the change (full weights via TransformerLens, or LoRA effective B@A when the checkpoint contains adapter keys). Embedding models diff HF encoder weights grouped by layer. Syncs a weightDiff card to the web orchestrator.

FlagDescription
--checkpoint*Path to merged .pt checkpoint or HF save_pretrained directory.
--nameLabel for checkpoint in output and web card (default: filename stem).
--saveWrite schema_version=1 JSON export.
--output jsonPrint raw JSON to stdout.
example

Uses the loaded session model as base. No --model override.

aquin merge-analysis

agent tool: run_merge_analysis

Pre-merge LoRA gate: runs weight-diff, flags matrices with elevated stable rank (rank/collapse signals), and optionally runs behavioral model-diff on probe generations (LLM only). Returns mergeVerdict pass | warn | fail. Exit code 2 on fail. Syncs a mergeAnalysis card.

FlagDescription
--checkpoint*Path to adapter or merged .pt checkpoint.
--nameLabel for checkpoint in output and web card.
--promptsJSON array or JSONL probes for behavioral diff (LLM).
--no-behavioralSkip generation-based behavioral scores (faster).
--saveWrite schema_version=1 JSON export.
--output jsonPrint raw JSON to stdout.
example

Embedding mode: weight-only verdict (no behavioral diff). Uses loaded session model as base.

aquin trajectory-analysis

agent tool: run_trajectory_analysis

Training trajectory: weight-diff summary for each checkpoint vs base, sorted by training step when present in the .pt file. Use --checkpoints <glob> or --dir to scan a checkpoint folder. Syncs a trajectoryAnalysis card with a step table and sparkline.

FlagDescription
--checkpointsGlob of .pt checkpoints (e.g. ~/runs/checkpoints/step_*.pt).
--dirRecursively scan directory for *.pt files.
--nameOptional prefix for step labels.
--saveWrite schema_version=1 JSON export.
--output jsonPrint raw JSON to stdout.
example

Provide exactly one of --checkpoints or --dir. Uses loaded session model as base.

aquin residual-drift

agent tool: run_residual_drift

Per-layer activation drift between the catalog base model and a fine-tuned checkpoint on the same probe set. LLMs compare last-token hook_resid_post cosine distance per layer; embedding models compare mean-pooled hidden states per encoder layer. Complements weight-diff (weight space) and sae diff (sparse feature space). Syncs a residualDrift card to the web orchestrator.

FlagDescription
--checkpoint*Path to merged .pt checkpoint or HF save_pretrained directory.
--promptsJSON array or JSONL probe strings (default: built-in short prompts).
--nameLabel for checkpoint in output and web card (default: filename stem).
--saveWrite schema_version=1 JSON export.
--output jsonPrint raw JSON to stdout.
example

Uses the loaded session model id for catalog base weights. No --model override.

aquin sae diff

agent tool: run_sae_diff

Load the catalog base model and a fine-tuned checkpoint, run the same prompts through the public SAE, and report per-feature activation deltas (changed count, mean/max |Δ|, top features). LLMs use TransformerLens residuals; embedding models use mean-pooled layer activations (default layer 11). Syncs a saeDiff card to the web orchestrator.

FlagDescription
--model*Catalog model slug (e.g. llama-3.2-1b).
--checkpoint*Path to merged .pt checkpoint or HF save_pretrained directory.
--promptsJSON array or JSONL of probe strings / {instruction, response} rows.
--layerSAE layer (default: from model config).
--saeCustom SAE weights path instead of pulled public SAE.
--nameLabel for checkpoint in output and web card (default: checkpoint filename).
--outputWrite full JSON payload to disk.
example

Checkpoint format: { step, state_dict } from run.checkpoint() or fixtures/e2e/scripts/train_lora_e2e.py. Starts local engine server on localhost:8002 like other GPU tools.

aquin simulate (saeDiff)

agent tool: run_simulation

At the end of aquin simulate, the pipeline runs an SAE diff between base and the NTK-linearized synthetic checkpoint. Stream logs [simulate] SAE diff: … with nChanged / meanAbsDelta. See Simulation (LLM) for full simulate flags.

example

Synthetic checkpoint — not the same as sae diff on a real LoRA checkpoint. See /docs/simulation/llm.

Typical workflow

After fine-tuning (your trainer + run.checkpoint(), or the E2E fixture train_lora_e2e.py), capture probes → diff → temp train → align. Activation capture and sae train live under SAE training.

post-training SAE pipeline

Web mirror

Each command pushes tool.start / tool.result to your session. Cards:

  • sae diff — changed features, top deltas, base vs FT table
  • weight diff — per-layer ‖ΔW‖, matrix-type breakdown, top changed weights
  • merge analysis — pre-merge verdict, rank/collapse signals, behavioral scores (LLM)
  • trajectory analysis — multi-checkpoint ‖ΔW‖ over training steps
  • residual drift — per-layer cosine distance on activations (last-token resid or pooled hidden)
  • sae train — layer, quick/full, output path
  • sae align — mean cosine, weakest/strongest decoder matches

Collapsible sync row shows Invoked / Completed; full JSON is slimmed in sync payload (details live on the card).

vs simulate & watch

aquin sae diffaquin simulateaquin watch
CheckpointReal merged .pt from trainingSynthetic NTK-linearized weightsNo weights — metrics JSONL only
GPURequiredRequiredNot required
Web cardsaeDiffSimulation + saeDiff in streamtraining.watch.*